Novel Nucleotide and Amino Acid Sequences, and Assays and Methods of Use Thereof for Diagnosis of Lung Cancer

ABSTRACT

Novel markers for lung cancer that are both sensitive and accurate. These markers are overexpressed in lung cancer specifically, as opposed to normal lung tissue. The measurement of these markers, alone or in combination, in patient samples provides information that the diagnostician can correlate with a probable diagnosis of lung cancer. The markers of the present invention, alone or in combination, show a high degree of differential detection between lung cancer and non-cancerous states.

CROSS-REFERENCE TO RELATED APPLICATION(S)

This application is related to Novel Nucleotide and Amino AcidSequences, and Assays and Methods of use thereof for Diagnosis of LungCancer, and is a continuation-in-part of U.S. Non-provisionalapplication Ser. No. 11/051,720 filed on Jan. 27, 2005, which claims thebenefit of priority from the below U.S. Provisional Applications whichare:

-   -   Application No. 60/620,916 filed Oct. 22, 2004—Differential        Expression of Markers in Colon Cancer    -   Application No. 60/620,874 filed Oct. 22, 2004—Differential        Expression of Markers in Ovarian Cancer    -   Application No. 60/589,815 filed Jul. 22, 2004—Differential        Expression of Markers in Lung Cancer    -   Application No. 60/607,307 filed Sep. 7, 2004—Differential        Expression of Markers in Lung Cancer    -   Application No. 60/620,853 filed Oct. 22, 2004—Differential        Expression of Markers in Lung Cancer    -   Application No. 60/628,112 filed Nov. 17, 2004—Differential        Expression of Markers in Lung Cancer II    -   Application No. 60/539,129 filed Jan. 27, 2004—Methods and        Systems for Annotating Biomolecular Sequences

Each of the above-identified U.S. Non-provisional and U.S. ProvisionalApplications are incorporated herein by reference in their entirety.

FIELD OF THE INVENTION

The present invention is related to novel nucleotide and proteinsequences that are diagnostic markers for lung cancer, and assays andmethods of use thereof.

BACKGROUND OF THE INVENTION

Lung cancer is the primary cause of cancer death among both men andwomen in the U.S., with an estimated 172,000 new cases being reported in1994. The five-year survival rate among all lung cancer patients,regardless of the stage of disease at diagnosis, is only 13%. Thiscontrasts with a five-year survival rate of 46% among cases detectedwhile the disease is still localized. However, only 16% of lung cancersare discovered before the disease has spread. Lung cancers are broadlyclassified into small cell or non-small cell lung cancers. Non-smallcell lung cancers are further divided into adenocarcinomas,bronchoalveolar-alveolar, squamous cell and large cell carcinomas.Approximately, 75-85 percent of lung cancers are non-small cell cancersand 15-25 percent are small cell cancers of the lung.

Early detection is difficult since clinical symptoms are often not seenuntil the disease has reached an advanced stage. Currently, diagnosis isaided by the use of chest x-rays, analysis of the type of cellscontained in sputum and fiberoptic examination of the bronchialpassages. Treatment regimens are determined by the type and stage of thecancer, and include surgery, radiation therapy and/or chemotherapy.

Early detection of primary, metastatic, and recurrent disease cansignificantly impact the prognosis of individuals suffering from lungcancer. Non-small cell lung cancer diagnosed at an early stage has asignificantly better outcome than that diagnosed at more advancedstages. Similarly, early diagnosis of small cell lung cancer potentiallyhas a better prognosis.

Although current radiotherapeutic agents, chemotherapeutic agents andbiological toxins are potent cytotoxins, they do not discriminatebetween normal and malignant cells, producing adverse effects anddose-limiting toxicities. There remains a need for lung cancer specificcancer markers. There remains a need for reagents and kits which can beused to detect the presence of lung cancer markers in samples frompatients. There remains a need for methods of screening and diagnosingindividuals who have lung cancer and methods of monitoring response totreatment, disease progression and disease recurrence in patientsdiagnosed with lung cancer. There remains a need for reagents, kits andmethods for determining the type of lung cancer that an individual whohas lung cancer has. There remains a need for compositions which canspecifically target lung cancer cells. There remains a need for imagingagents which can specifically bind to lung cancer cells. There remains aneed for improved methods of imaging lung cancer cells. There remains aneed for therapeutic agents which can specifically bind to lung cancercells. There remains a need for improved methods of treating individualswho are suspected of suffering from lung cancer.

SUMMARY OF THE INVENTION

The background art does not teach or suggest markers for lung cancerthat are sufficiently sensitive and/or accurate, alone or incombination.

The present invention overcomes these deficiencies of the background artby providing novel markers for lung cancer that are both sensitive andaccurate. Furthermore, these markers are able to distinguish betweendifferent types of lung cancer, such as small cell or non-small celllung cancer, and further between non-small cell lung cancer types, suchas adenocarcinomas, squamous cell and large cell carcinomas. Thesemarkers are overexpressed in lung cancer specifically, as opposed tonormal lung tissue. The measurement of these markers, alone or incombination, in patient (biological) samples provides information thatthe diagnostician can correlate with a probable diagnosis of lungcancer. The markers of the present invention, alone or in combination,show a high degree of differential detection between lung cancer andnon-cancerous states.

According to preferred embodiments of the present invention, examples ofsuitable biological samples which may optionally be used with preferredembodiments of the present invention include but are not limited toblood, serum, plasma, blood cells, urine, sputum, saliva, stool, spinalfluid or CSF, lymph fluid, the external secretions of the skin,respiratory, intestinal, and genitourinary tracts, tears, milk, neuronaltissue, lung tissue, any human organ or tissue, including any tumor ornormal tissue, any sample obtained by lavage (for example of thebronchial system or of the breast ductal system), and also samples of invivo cell culture constituents. In a preferred embodiment, thebiological sample comprises lung tissue and/or sputum and/or a serumsample and/or a urine sample and/or any other tissue or liquid sample.The sample can optionally be diluted with a suitable eluant beforecontacting the sample to an antibody and/or performing any otherdiagnostic assay.

Information given in the text with regard to cellular localization wasdetermined according to four different software programs: (i) tmhmm(from Center for Biological Sequence Analysis, Technical University ofDenmark DTU, dpt cbs dot dtu dot dk/services/TMHMM/TMHMM2 dot 0b dotguide dot php) or (ii) tmpred (from EMBnet, maintained by the ISRECBioinformatics group and the LICR Information Technology Office, LudwigInstitute for Cancer Research, Swiss Institute of Bioinformatics, dot chdot embnet dot org/software/TMPRED_form dot html for transmembraneregion prediction; (iii) signalp_hmm or (iv) signalp_nn (both fromCenter for Biological Sequence Analysis, Technical University of DenmarkDTU, dot cbs dot dtu dot dk/services/SignalP/background/prediction dotphp) for signal peptide prediction. The terms “signalp_hmm” and“signalp_nn” refer to two modes of operation for the program SignalP:hmm refers to Hidden Markov Model, while nn refers to neural networks.Localization was also determined through manual inspection of knownprotein localization and/or gene structure, and the use of heuristics bythe individual inventor. In some cases for the manual inspection ofcellular localization prediction inventors used the ProLoc computationalplatform [Einat Hazkani-Covo, Erez Levanon, Galit Rotman, Dan Graur andAmit Novik; (2004) “Evolution of multicellularity in metazoa:comparative analysis of the subcellular localization of proteins inSaccharomyces, Drosophila and Caenorhabditis.” Cell BiologyInternational 2004; 28(3):171-81, which predicts protein localizationbased on various parameters including, protein domains (e.g., predictionof trans-membranous regions and localization thereof within theprotein), pI, protein length, amino acid composition, homology topre-annotated proteins, recognition of sequence patterns which directthe protein to a certain organelle (such as, nuclear localizationsignal, NLS, mitochondria localization signal), signal peptide andanchor modeling and using unique domains from Pfam that are specific toa single compartment.

Information is given in the text with regard to SNPs (single nucleotidepolymorphisms). A description of the abbreviations is as follows. “T→C”,for example, means that the SNP results in a change at the positiongiven in the table from T to C. Similarly, “M→Q”, for example, meansthat the SNP has caused a change in the corresponding amino acidsequence, from methionine (M) to glutamine (Q). If, in place of a letterat the right hand side for the nucleotide sequence SNP, there is aspace, it indicates that a frameshift has occurred. A frameshift mayalso be indicated with a hyphen (-). A stop codon is indicated with anasterisk at the right hand side (*). As part of the description of anSNP, a comment may be found in parentheses after the above descriptionof the SNP itself. This comment may include an FTId, which is anidentifier to a SwissProt entry that was created with the indicated SNP.An FTId is a unique and stable feature identifier, which allowsconstruction of links directly from position-specific annotation in thefeature table to specialized protein-related databases. The FTId isalways the last component of a feature in the description field, asfollows: FTId=XXX_number, in which XXX is the 3-letter code for thespecific feature key, separated by an underscore from a 6-digit number.In the table of the amino acid mutations of the wild type proteins ofthe selected splice variants of the invention, the header of the firstcolumn is “SNP position(s) on amino acid sequence”, representing aposition of a known mutation on amino acid sequence. SNPs may optionallybe used as diagnostic markers according to the present invention, aloneor in combination with one or more other SNPs and/or any otherdiagnostic marker. Preferred embodiments of the present inventioncomprise such SNPs, including but not limited to novel SNPs on the known(WT or wild type) protein sequences given below, as well as novelnucleic acid and/or amino acid sequences formed through such SNPs,and/or any SNP on a variant amino acid and/or nucleic acid sequencedescribed herein.

Information given in the text with regard to the Homology to the knownproteins was determined by Smith-Waterman version 5.1.2 using special(non default) parameters as follows:

-   model=sw.model-   GAPEXT=0-   GAPOP=100.0-   MATRIX=blosum100

Information is given with regard to overexpression of a cluster incancer based on ESTs. A key to the p values with regard to the analysisof such overexpression is as follows:

-   -   library-based statistics: P-value without including the level of        expression in cell-lines (P1)    -   library based statistics: P-value including the level of        expression in cell-lines (P2)    -   EST clone statistics: P-value without including the level of        expression in cell-lines (SP1)    -   EST clone statistics: predicted overexpression ratio without        including the level of expression in cell-lines (R3)    -   EST clone statistics: P-value including the level of expression        in cell-lines (SP2)    -   EST clone statistics: predicted overexpression ratio including        the level of expression in cell-lines (R4)

Library-based statistics refer to statistics over an entire library,while EST clone statistics refer to expression only for ESTs from aparticular tissue or cancer.

Information is given with regard to overexpression of a cluster incancer based on microarrays. As a microarray reference, in the specificsegment paragraphs, the unabbreviated tissue name was used as thereference to the type of chip for which expression was measured. Thereare two types of microarray results: those from microarrays preparedaccording to a design by the present inventors, for which the microarrayfabrication procedure is described in detail in Materials andExperimental Procedures section herein; and those results frommicroarrays using Affymetrix technology. As a microarray reference, inthe specific segment paragraphs, the unabbreviated tissue name was usedas the reference to the type of chip for which expression was measured.For microarrays prepared according to a design by the present inventors,the probe name begins with the name of the cluster (gene), followed byan identifying number. Oligonucleotide microarray results taken fromAffymetrix data were from chips available from Affymetrix Inc, SantaClara, Calif., USA (see for example data regarding the Human Genome U133(HG-U133) Set at dot affymetrix dot com/products/arrays/specific/hgu133dot affx; GeneChip Human Genome U133A 2.0 Array at dot affymetrix dotcom/products/arrays/specific/hgu133av2 dot affx; and Human Genome U133Plus 2.0 Array at dot affymetrix dotcom/products/arrays/specific/hgu133plusdot affx). The probe names followthe Affymetrix naming convention. The data is available from NCBI GeneExpression Omnibus (see dot ncbi dot nlm dot nih dot gov/projects/geo/and Edgar et al, Nucleic Acids Research, 2002, Vol. 30, No. 1 207-210).The dataset (including results) is available from dot ncbi dot nlm dotnih dot gov/geo/query/acc dot cgi?acc=GSE1133 for the Series GSE1133database (published on March 2004); a reference to these results is asfollows: Su et al (Proc Natl Acad Sci USA. 2004 Apr. 20; 101(16):6062-7.Epub 2004 Apr. 9). Probes designed by the present inventors are listedbelow.

>H61775_0_11_0 (SEQ ID NO: 204)CCCCAGCTTTTATAGAGCGGCCCAAGGAAGAATATTTCCAAGAAGTAGGG >M85491_0_0_25999(SEQ ID NO: 205)GACATCTTTGCATATCATGTCAGAGCTATAACATCATTGTGGAGAAGCTC >M85491_0_14_0 (SEQID NO: 206)GTCATGAAAATCAACACCGAGGTGCGGAGCTTCGGACCTGTGTCCCGCAG >Z21368_0_0_61857(SEQ ID NO: 207)AGTTCATCCTTCTTCAGTGTGACCAGTAAATTCTTCCCATACTCTTGAAG >HUMGRP5E_0_0_16630(SEQ ID NO: 208)GCTGATATGGAAGTTGGGGAATCTGAATTGCCAGAGAATCTTGGGAAGAG >HUMGRP5E_0_2_0 (SEQID NO: 209)TCTCATAGAAGCAAAGGAGAACAGAAACCACCAGCCACCTCAACCCAAGG >D56406_0_5_0 (SEQ IDNO: 210)TCTGACTTTTACGGACTTGGCTTGTTAGAAGGCTGAAAGATGATGGCAGG >F05068_0_0_5744 (SEQID NO: 211)ACGGGAGGGAAGGAAGGTGTGCGGGAGGAGTTCTCTGTCTCCACTCCCCT >F05068_0_0_5754 (SEQID NO: 212)CAAGGGGAACTGACCGTTGGTCCCGAAGGTCTAGAAGTGAATGGGAGCAG >F05068_0_8_0 (SEQ IDNO: 213)CTGGGGTTGGACTTCGGAGTTTTGCCATTGCCAGTGGGACGTCTGAGACT >F05068_0_1_5751 (SEQID NO: 214)TCTTAGCAGGTAGGTGCCGCAGACCCTGCGGGTTAAGAGGTGGGGTGGGG >H38804_0_3_0 (SEQ IDNO: 215)CGTAATTGCAGTGCATTTAGACAGGCATCTATTTGGACCTGTTTCTATCT >HSENA78_0_1_0 (SEQID NO: 216)TGAAGAGTGTGAGGAAAACCTATGTTTGCCGCTTAAGCTTTCAGCTCAGC >R00299_0_8_0 (SEQ IDNO: 217)CCAAGGCTCGTCTGCGCACCTTGTGTCTTGTAGGGTATGGTATGTGGGAC >Z44808_0_8_0 (SEQ IDNO: 218)AAAAGCATGAGTTTCTGACCAGCGTTCTGGACGCGCTGTCCACGGACATG >Z44808_0_0_72347(SEQ ID NO: 219)ATGTTCTTAGGAGGCAAGCCAGGAGAAGCCGGGTCTGACTTTTCAGCTCA >Z44808_0_0_72349(SEQ ID NO: 220)TCCTCCAGACCCAAAGCCACAACCCATCGCAAGTCAAGAACACTTTCCAG >AA161187_0_0_433(SEQ ID NO: 221)ACCCTGGGTGGGCAAAAACGTGCTTTCCCGGACGGGGTTGAAGGGGAGAA >AA161187_0_0_430(SEQ ID NO: 222)TGGAGACTGTTGCCCCACTCTGCAGATGCAGAAACGGAGGCTTGGCTGCT >R66178_0_7_0 (SEQ IDNO: 223)CCAGTGTGGTATCCTGGGAAACTCGGTTAAAAGGTGAGGCAGAGTACCAG >HUMPHOSLIP_0_0_18458(SEQ ID NO: 224)AAGGAAGCAGGACCAGTGGATGTGAGGCGTGGTCGAAGAACAACAGAAAG >HUMPHOSLIP_0_0_18487(SEQ ID NO: 225)ACAGGGGCCAGATGGTGACCCATGACCCAGCCTAAAAGGCAGCCAGAGGG >A1076020_0_3_0 (SEQID NO: 226)ATCAGCACTGCCACCTACACCACGGTGCCGCGCGTGGCCTTCTACGCCGG >T23580_0_0_902 (SEQID NO: 227)GTGAAACCCCATTGGCTTCATTGGCTCCTTGATTTAAACCACGCCCGGCT >T23580_0_0_901 (SEQID NO: 228)TGAGTCCGTGTTATATCATGTGGTCTCATTGATAGGCGGGATAGGGAGGG >M79217_0_9_0 (SEQ IDNO: 229)TTTGTGGAATAGCAACCCATGGTTATGGCGAGTGACCCGACGTGATCTGG >M62096_0_0_20588(SEQ ID NO: 230)AAGGCTTAGGTGCAAAGCCATTGGATACCATACCTGAGACCACACAGCCA >M62096_0_7_0 (SEQ IDNO: 231)ACCAGAAGCAGCTGTCCAGACTCCGAGACGAAATTGAGGAGAAGCAGAAA >M78076_0_7_0 (SEQ IDNO: 232)GAGAAGATGAACCCGCTGGAACAGTATGAGCGAAAGGTGAATGCGTCTGT >T99080_0_0_58896(SEQ ID NO: 233)AACTCACAGCAAGAGCTGTGTTCCAGTTAGCTTTGCTACCAGTTATGCAG >T08446_0_9_0 (SEQ IDNO: 234)CATTTCCACTACGAGAACGTTGACTTTGGCCACATTCAGCTCCTGCTGTC >HUMCA1XIA_0_0_14909(SEQ ID NO: 235)GCTGCAATCTAAGTTTCGGAATACTTATACCACTCCAGAAATAATCCTCG >HUMCA1XIA_0_18_0(SEQ ID NO: 236)TTCAGAACTGTTAACATCGCTGACGGGAAGTGGCATCGGGTAGCAATCAG >T11628_0_9_0 (SEQ IDNO: 237)ACAAGATCCCCGTGAAGTACCTGGAGTTCATCTCGGAATGCATCATCCAG >T11628_0_0_45174(SEQ ID NO: 238)TAAACAATCAAAGAGCATGTTGGCCTGGTCCTTTGCTAGGTACTGTAGAG >T11628_0_0_45161(SEQ ID NO: 239)TGCCTCGCCACAATGGCACCTGCCCTAAAATAGCTTCCCATGTGAGGGCT >HUMCEA_0_0_96 (SEQID NO: 240)CAAGAGGGGTTTGGCTGAGACTTTAGGATTGTGATTCAGCTTAGAGGGAC >HUMCEA_0_0_15183(SEQ ID NO: 241)CCTGGTGGGAGCCCATGAGAAGCGAGTTCTCTGTGCAACGGACTTAGTAA >HUMCEA_0_0_15182(SEQ ID NO: 242)GCTCCCTGGAGCATCAGCATCATATTCTGGGGTGGAGTCTATCTGGTTCT >HUMCEA_0_0_15168(SEQ ID NO: 243)TCCTGCCTGTCACCTGAAGTTCTAGATCATTCCCTGGACTCCACTCTATC >HUMCEA_0_0_15180(SEQ ID NO: 244)TTTAACACAGGATTGGGACAGGATTCAGAGGGACACTGTGGCCCTTCTAC >R35137_0_5_0 (SEQ IDNO: 245)TATGTGGAGGTGGTGAACATGGACGCTGCAGTGCAGCAGCAGATGCTGAA >Z25299_0_3_0 (SEQ IDNO: 246)AACTCTGGCACCTTGGGCTGTGGAAGGCTCTGGAAAGTCCTTCAAAGCTG >HSSTROL3_0_0_12518(SEQ ID NO: 247)ATGAGAGTAACCTCACCCGTGCACTAGTTTACAGAGCATTCACTGCCCCA >HSSTROL3_0_0_12517(SEQ ID NO: 248)CAGAGATGAGAGCCTGGAGCATTGCAGATGCCAGGGACTTCACAAATGAA >HSS100PCB_0_0_12280(SEQ ID NO: 249)CTCAAAATGAAACTCCCTCTCGCAGAGCACAATTCCAATTCGCTCTAAAA >R20779_0_0_30670(SEQ ID NO: 250) CCGCGTTGCTTCTAGAGGCTGAATGCCTTTCAAATGGAGAAGGCTTCCAT

The following list of abbreviations for tissues was used in the TAAhistograms. The term “TAA” stands for “Tumor Associated Antigen”, andthe TAA histograms, given in the text, represent the cancerous tissueexpression pattern as predicted by the biomarkers selection engine, asdescribed in detail in examples 1-5 below:

-   -   “BONE” for “bone”;    -   “COL” for “colon”;    -   “EPI” for “epithelial”;    -   “GEN” for “general”;    -   “LIVER” for “liver”;    -   “LUN” for “lung”;    -   “LYMPH” for “lymph nodes”;    -   “MARROW” for “bone marrow”;    -   “OVA” for “ovary”;    -   “PANCREAS” for “pancreas”;    -   “PRO” for “prostate”;    -   “STOMACH” for “stomach”;    -   “TCELL” for “T cells”;    -   “THYROID” for “Thyroid”;    -   “MAM” for “breast”;    -   “BRAIN” for “brain”;    -   “UTERUS” for “uterus”;    -   “SKIN” for “skin”;    -   “KIDNEY” for “kidney”;    -   “MUSCLE” for “muscle”;    -   “ADREN” for “adrenal”;    -   “HEAD” for “head and neck”;    -   “BLADDER” for “bladder”;

It should be noted that the terms “segment”, “seg” and “node” are usedinterchangeably in reference to nucleic acid sequences of the presentinvention; they refer to portions of nucleic acid sequences that wereshown to have one or more properties as described below. They are alsothe building blocks that were used to construct complete nucleic acidsequences as described in greater detail below. Optionally andpreferably, they are examples of oligonucleotides which are embodimentsof the present invention, for example as amplicons, hybridization unitsand/or from which primers and/or complementary oligonucleotides mayoptionally be derived, and/or for any other use.

As used herein the phrase “lung cancer” refers to cancers of the lungincluding small cell lung cancer and non-small cell lung cancer,including but not limited to lung adenocarcinoma, squamous cellcarcinoma, and adenocarcinoma.

The term “marker” in the context of the present invention refers to anucleic acid fragment, a peptide, or a polypeptide, which isdifferentially present in a sample taken from subjects (patients) havinglung cancer (or one of the above indicative conditions) as compared to acomparable sample taken from subjects who do not have lung cancer (orone of the above indicative conditions).

The phrase “differentially present” refers to differences in thequantity of a marker present in a sample taken from patients having lungcancer (or one of the above indicative conditions) as compared to acomparable sample taken from patients who do not have lung cancer (orone of the above indicative conditions). For example, a nucleic acidfragment may optionally be differentially present between the twosamples if the amount of the nucleic acid fragment in one sample issignificantly different from the amount of the nucleic acid fragment inthe other sample, for example as measured by hybridization and/orNAT-based assays. A polypeptide is differentially present between thetwo samples if the amount of the polypeptide in one sample issignificantly different from the amount of the polypeptide in the othersample. It should be noted that if the marker is detectable in onesample and not detectable in the other, then such a marker can beconsidered to be differentially present.

As used herein the phrase “diagnostic” means identifying the presence ornature of a pathologic condition. Diagnostic methods differ in theirsensitivity and specificity. The “sensitivity” of a diagnostic assay isthe percentage of diseased individuals who test positive (percent of“true positives”). Diseased individuals not detected by the assay are“false negatives.” Subjects who are not diseased and who test negativein the assay are termed “true negatives.” The “specificity” of adiagnostic assay is 1 minus the false positive rate, where the “falsepositive” rate is defined as the proportion of those without the diseasewho test positive. While a particular diagnostic method may not providea definitive diagnosis of a condition, it suffices if the methodprovides a positive indication that aids in diagnosis.

As used herein the phrase “diagnosing” refers to classifying a diseaseor a symptom, determining a severity of the disease, monitoring diseaseprogression, forecasting an outcome of a disease and/or prospects ofrecovery. The term “detecting” may also optionally encompass any of theabove.

Diagnosis of a disease according to the present invention can beeffected by determining a level of a polynucleotide or a polypeptide ofthe present invention in a biological sample obtained from the subject,wherein the level determined can be correlated with predisposition to,or presence or absence of the disease. It should be noted that a“biological sample obtained from the subject” may also optionallycomprise a sample that has not been physically removed from the subject,as described in greater detail below.

As used herein, the term “level” refers to expression levels of RNAand/or protein or to DNA copy number of a marker of the presentinvention.

Typically the level of the marker in a biological sample obtained fromthe subject is different (i.e., increased or decreased) from the levelof the same variant in a similar sample obtained from a healthyindividual (examples of biological samples are described herein).

Numerous well known tissue or fluid collection methods can be utilizedto collect the biological sample from the subject in order to determinethe level of DNA, RNA and/or polypeptide of the variant of interest inthe subject.

Examples include, but are not limited to, fine needle biopsy, needlebiopsy, core needle biopsy and surgical biopsy (e.g., brain biopsy), andlavage. Regardless of the procedure employed, once a biopsy/sample isobtained the level of the variant can be determined and a diagnosis canthus be made.

Determining the level of the same variant in normal tissues of the sameorigin is preferably effected along-side to detect an elevatedexpression and/or amplification and/or a decreased expression, of thevariant as opposed to the normal tissues.

A “test amount” of a marker refers to an amount of a marker in asubject's sample that is consistent with a diagnosis of lung cancer (orone of the above indicative conditions). A test amount can be either inabsolute amount (e.g., microgram/ml) or a relative amount (e.g.,relative intensity of signals).

A “control amount” of a marker can be any amount or a range of amountsto be compared against a test amount of a marker. For example, a controlamount of a marker can be the amount of a marker in a patient with lungcancer (or one of the above indicative conditions) or a person withoutlung cancer (or one of the above indicative conditions). A controlamount can be either in absolute amount (e.g., microgram/nil) or arelative amount (e.g., relative intensity of signals).

“Detect” refers to identifying the presence, absence or amount of theobject to be detected.

A “label” includes any moiety or item detectable by spectroscopic, photochemical, biochemical, immunochemical, or chemical means. For example,useful labels include ³²P, ³⁵S, fluorescent dyes, electron-densereagents, enzymes (e.g., as commonly used in an ELISA),biotin-streptavadin, dioxigenin, haptens and proteins for which antiseraor monoclonal antibodies are available, or nucleic acid molecules with asequence complementary to a target. The label often generates ameasurable signal, such as a radioactive, chromogenic, or fluorescentsignal, that can be used to quantify the amount of bound label in asample. The label can be incorporated in or attached to a primer orprobe either covalently, or through ionic, van der Waals or hydrogenbonds, e.g., incorporation of radioactive nucleotides, or biotinylatednucleotides that are recognized by streptavadin. The label may bedirectly or indirectly detectable. Indirect detection can involve thebinding of a second label to the first label, directly or indirectly.For example, the label can be the ligand of a binding partner, such asbiotin, which is a binding partner for streptavadin, or a nucleotidesequence, which is the binding partner for a complementary sequence, towhich it can specifically hybridize. The binding partner may itself bedirectly detectable, for example, an antibody may be itself labeled witha fluorescent molecule. The binding partner also may be indirectlydetectable, for example, a nucleic acid having a complementarynucleotide sequence can be a part of a branched DNA molecule that is inturn detectable through hybridization with other labeled nucleic acidmolecules (see, e.g., P. D. Fahrlander and A. Klausner, Bio/Technology6:1165 (1988)). Quantitation of the signal is achieved by, e.g.,scintillation counting, densitometry, or flow cytometry.

Exemplary detectable labels, optionally and preferably for use withimmunoassays, include but are not limited to magnetic beads, fluorescentdyes, radiolabels, enzymes (e.g., horse radish peroxide, alkalinephosphatase and others commonly used in an ELISA), and calorimetriclabels such as colloidal gold or colored glass or plastic beads.Alternatively, the marker in the sample can be detected using anindirect assay, wherein, for example, a second, labeled antibody is usedto detect bound marker-specific antibody, and/or in a competition orinhibition assay wherein, for example, a monoclonal antibody which bindsto a distinct epitope of the marker are incubated simultaneously withthe mixture.

“Immunoassay” is an assay that uses an antibody to specifically bind anantigen. The immunoassay is characterized by the use of specific bindingproperties of a particular antibody to isolate, target, and/or quantifythe antigen.

The phrase “specifically (or selectively) binds” to an antibody or“specifically (or selectively) immunoreactive with,” when referring to aprotein or peptide (or other epitope), refers to a binding reaction thatis determinative of the presence of the protein in a heterogeneouspopulation of proteins and other biologics. Thus, under designatedimmunoassay conditions, the specified antibodies bind to a particularprotein at least two times greater than the background (non-specificsignal) and do not substantially bind in a significant amount to otherproteins present in the sample. Specific binding to an antibody undersuch conditions may require an antibody that is selected for itsspecificity for a particular protein. For example, polyclonal antibodiesraised to seminal basic protein from specific species such as rat,mouse, or human can be selected to obtain only those polyclonalantibodies that are specifically immunoreactive with seminal basicprotein and not with other proteins, except for polymorphic variants andalleles of seminal basic protein. This selection may be achieved bysubtracting out antibodies that cross-react with seminal basic proteinmolecules from other species. A variety of immunoassay formats may beused to select antibodies specifically immunoreactive with a particularprotein. For example, solid-phase ELISA immunoassays are routinely usedto select antibodies specifically immunoreactive with a protein (see,e.g., Harlow & Lane, Antibodies, A Laboratory Manual (1988), for adescription of immunoassay formats and conditions that can be used todetermine specific immunoreactivity). Typically a specific or selectivereaction will be at least twice background signal or noise and moretypically more than 10 to 100 times background.

According to preferred embodiments of the present invention, preferablyany of the above nucleic acid and/or amino acid sequences furthercomprises any sequence having at least about 70%, preferably at leastabout 80%, more preferably at least about 90%, most preferably at leastabout 95% homology thereto.

Unless otherwise noted, all experimental data relates to variants of thepresent invention, named according to the segment being tested (asexpression was tested through RT-PCR as described).

All nucleic acid sequences and/or amino acid sequences shown herein asembodiments of the present invention relate to their isolated form, asisolated polynucleotides (including for all transcripts),oligonucleotides (including for all segments, amplicons and primers),peptides (including for all tails, bridges, insertions or heads,optionally including other antibody epitopes as described herein) and/orpolypeptides (including for all proteins). It should be noted thatoligonucleotide and polynucleotide, or peptide and polypeptide, mayoptionally be used interchangeably.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 1 and 2.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 1022, 1023,1024, 1025, 1026 and 1027.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs: 1281 and 1282.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 3 and 4.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 1028, 1029,1030, 1031, 1032, 1033, 1034, 1035, 1036, 1037 and 1038.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs: 1283 and 1284.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 5, 6, 7 and8.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 1039, 1040,1041, 1042, 1043, 1044, 1045, 1046, 1047, 1048, 1049, 1050, 1051, 1052,1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1063, 1064,1065 and 1066.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs: 1285, 1286, 1287and 1288.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 9, 10, 11,12, 13, 14 and 15.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 1067, 1068,1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1079, 1080,1081, 1082, 1083, 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1091, 1092,1093, 1094, 1095, 1096, 1097, 1098, 1099 and 1100.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs 1289, 1290, 1291,1292, 1293 and 1294.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 20 and 21.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 1130, 1131,1132, 1133 and 1134.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs: 1299 and 1300.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 22, 23 and24.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 1135, 1136,1137, 1138, 1139, 1140, 1141, 1142, 1143 and 1144.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs 1301, 1302 and1303.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 25, 26 and27.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 1145, 1146,1147, 1148, 1149, 1150, 1151, 1152, 1153, 1154, 1155 and 1156.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs 1304 and 1305.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 28.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 1157, 1158,1159, 1160, 1161, 1162, 1163, 1164, 1165, 1166, 1167, 1168, 1169, 1170and 1171.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NO: 1306.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 29 and 30.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 1172, 1173,1174, 1175, 1176, 1177, 1178, 1179, 1180, 1181, 1182, 1183, 1184, 1185,1186, 1187, 1188, 1189, 1190 and 1191.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs 1307 and 1308.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 31.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 1192, 1193,1194, 1195, 1196, 1197 and 1198.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NO: 1309.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 32.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 1199, 1200,1201, 1202, 1203, 1204, 1205, 1206, 1207, 1208, 1209, 1210, 1211, 1212,1213, 1214 and 1215.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NO. 1310.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 33.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 1216 and1217, 1218, 1219, 1220, 1221, 1222, 1223, 1224, 1225, 1226 and 1227.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NO: 1311.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 34.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 1228, 1229,1230, 1231, 1232 and 1223.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NO: 1312.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 35.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 1234, 1235,1236, 1237, 1238, 1239, 1240, 1241, 1242, 1243, 1244, 1245, 1246, 1247,1248, 1249, 1250, 1251, 1252, 1253 and 1254.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NO: 1313.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 36, 37, 38,39 and 40.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 1255, 1256,1257, 1258, 1259, 1260, 1261, 1262, 1263, 1264, 1265, 1266, 1267, 1268,1269, 1270, 1271, 1272, 1273, 1274 and 1275.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs 1314, 1315, 1316and 1317.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 125, 126,127, 128, 129 and 130.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 887, 888,889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, 901 and 902.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs: 1394, 1395,1396, 1397 and 1398.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising a transcript SEQ ID NOs:131 and 132.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 903, 904,905, 906, 907, 907, 908 and 909.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs 1399 and 1400.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 99, 100, 101and 102.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 742, 743,744, 745, 746, 747, 748, 749, 750, 751, 752, 753, 754, 755, 756, 757,758, 759, 760, 761, 762, 763, 764, 765, 766, 767, 768, 769, 770, 771,772, 773, 774, 775, 776, 777, 778, 779, 780, 781, 782, 783, 784, 785,786, 787 and 788.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs 1372, 1373, 1374and 1375.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 134.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 913, 914,915, 916, 917, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928,929, 930, 931, 932, 933, 934, 935 and 936.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NO: 1402.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NO: 133.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 910, 911 and912.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 141, 142 and142.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 961, 962,963, 964, 965, 966, 967, 968, 969, 970, 971, 972, 973, 974, 975, 976,977, 978, 979, 980, 981, 982, 983, 984, 985, 986, 987, 988, 989 and 990.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising:

Protein Name

HUMOSTRO_PEA_(—)1_PEA_(—)1_P21 (SEQ ID NO:1627)

HUMOSTRO_PEA_(—)1_PEA_(—)1_P25 (SEQ ID NO:1628)

HUMOSTRO_PEA_(—)1_PEA_(—)1_P30 (SEQ ID NO:1629)

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 51, 52, 53,54, 55, 56 and 57.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 518, 519,520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533,534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546,547,548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560,561, 562, 563, 564, 565, 566, 567, 568, 569 and 570.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs 1327, 1328, 1329,1330, 1331, 1332 and 1333.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 135, 136,137, 138, 139 and 140.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 937, 938,939, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 950, 951, 952,953, 954, 955, 956, 957, 958, 959 and 960.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs 1403, 1404, 1405,1406, 1407 and 1408.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 41, 42, 43,44, 45, 46 and 47.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 482, 483,484, 495, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497,498, 499, 500 and 501.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs: 1318, 1319,1320, 1321, 1322 and 1323.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 121, 122, 123and 124.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 876, 877,878, 879, 880, 881, 882, 883, 884, 885 and 886.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs: 1390, 1391, 1392and 1393.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 48, 49 and50.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 502, 503,504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516 and 517.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs: 1324, 1325 and1326.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 1464 and1465.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising a SEQ ID NOs: 1276, 1277,1278, 1279 and 1280.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NO: 1415.

Protein Name Corresponding Transcript(s)

HSU33147_PEA_(—)1_P5 HSU33147_PEA_(—)1_T1 (SEQ ID NO:1464);HSU33147_PEA_(—)1_T2 (SEQ ID NO:1465)

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NO: 58.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 571, 572,573, 574, 575, 576, 577 and 578.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NO: 1334.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 74, 75, 76,77, 78, 79, 80, 81 and 82.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 659, 660,661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674,675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688,689, 690, 691, 692 and 693.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs 1350, 1351, 1352,1353, 1354, 1355, 1356 and 1357.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs:

Transcript Name

T23580_T10 (SEQ ID NO:1626)

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 579, 580,581, 582 and 583.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs 1335.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 59, 60, 61,62, 63 and 64.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 584, 585,586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599,600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613,614 and 615.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs: 1336, 1337,1338, 1339 and 1340.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 65, 66, 67,68, 69, 70, 71, 72 and 73.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 616, 617,618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631,632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645,646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658 and 659.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs: 1341, 1342,1343, 1344, 1345, 1346, 1347, 1348 and 1349.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 83, 84, 85,86, 87, 88, 89, 90, 91, 92, 93, 94, 95 and 96.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 695, 696,697, 698, 699, 700, 701, 702, 703, 704 and 705.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs 1358, 1359, 1360,1361, 1362, 1363, 1364, 1365, 1366, 1367, 1368 and 1369.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 97 and 98.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 706, 707,708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721,722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735,736, 737, 738, 739, 740 and 741.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs: 1370 and 1371.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 103, 104,105, 106, 107 and 108.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 789, 790,791, 792, 793, 794, 795, 796, 797, 798, 799, 800, 801, 802, 803, 804,805, 806, 807, 808, 809, 810, 811, 812 and 813.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs: 1376, 1377, 1378and 1379.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 114, 115,116, 117, 118 and 119.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 856, 857,858, 859, 860, 861, 862, 863, 864, 865, 866, 867, 868, 869, 870, 871,872, 873, 874 and 875.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs: 1385, 1386,1387, 1388 and 1389.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 144, 145,146, 147, 148 and 149.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 991, 992,993, 994, 995, 996, 997, 998, 999, 1000, 1001, 1002, 1003, 1004, 1005,1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015 and 1016.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs: 1409, 1410,1411, 1412 and 1413.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NO: 150.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 1017, 1018,1019, 1020 and 1021.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NO: 1414.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 109, 110,111, 112 and 113.

According to preferred embodiments of the present invention, there isprovided an isolated polynucleotide comprising SEQ ID NOs: 814, 815,816, 817, 818, 819, 820, 821, 822, 823, 824, 825, 826, 827, 829, 830,831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843, 844,845, 846, 847, 848, 849, 850, 851, 852, 853, 854 and 855.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide comprising SEQ ID NOs 1380, 1381, 1382,1383 and 1384.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for.HSSTROL3_P4 (SEQID NO:1394), comprising a first amino acid sequence being at least 90%homologous toMAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQPWHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVLSGGRWEKTDLTYRILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHEGRADIMIDFARYW corresponding to amino acids 1-163 ofMM11_HUMAN (SEQ ID NO:1455), which also corresponds to amino acids 1-163of HSSTROL3_P4 (SEQ ID NO:1394), a bridging amino acid H correspondingto amino acid 164 of HSSTROL3_P4 (SEQ ID NO:1394), a second amino acidsequence being at least 90% homologous toGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWTIGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDCRGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLEPDAPPDACEASFDAVSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGLPSPVDAAFEDAQGHIWFFQGAQYWVYDGEKPVLGPAPLTELGLVRFPVHAALVWGPEKNKIYFFRGRDYWRFHPSTRRVDSPVPRRATDWRGVPSEIDAAFQDADG corresponding to amino acids 165-445 of MM11_HUMAN (SEQ ID NO:1455),which also corresponds to amino acids 165-445 of HSSTROL3_P4 (SEQ IDNO:1394), and a third amino acid sequence being at least 70%, optionallyat least 80%, preferably at least 85%, more preferably at least 90% andmost preferably at least 95% homologous to a polypeptide having thesequence ALGVRQLVGGGHSSRFSHLVVAGLPHACHRKSGSSSQVLCPEPSALLSVAG (SEQ ID NO:251) corresponding to amino acids 446-496 of HSSTROL3_P4 (SEQ IDNO:1394), wherein said first amino acid sequence, bridging amino acid,second amino acid sequence and third amino acid sequence are contiguousand in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of HSSTROL3_P4 (SEQID NO:1394), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence ALGVRQLVGGGHSSRFSHLVVAGLPHACHRKSGSSSQVLCPEPSALLSVAG (SEQ ID NO:251) in HSSTROL3_P4 (SEQ ID NO:1394).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for HSSTROL3_P5 (SEQID NO:1395), comprising a first amino acid sequence being at least 90%homologous toMAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQPWHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVLSGGRWEKTDLTYRILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHEGRADIMIDFARYW corresponding to amino acids 1-163 ofMM11_HUMAN (SEQ ID NO:1455), which also corresponds to amino acids 1-163of HSSTROL3_P5 (SEQ ID NO:1395), a bridging amino acid H correspondingto amino acid 164 of HSSTROL3_P5 (SEQ ID NO:1395), a second amino acidsequence being at least 90% homologous toGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWTIGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDCRGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLEPDAPPDACEASFDAVSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGLPSPVDAAFEDAQGHIWFFQ corresponding toamino acids 165-358 of MM11_HUMAN (SEQ ID NO:1455), which alsocorresponds to amino acids 165-358 of HSSTROL3_P5 (SEQ ID NO:1395), anda third amino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceELGFPSSTGRDESLEHCRCQGLHK (SEQ ID NO: 252) corresponding to amino acids359-382 of HSSTROL3_P5 (SEQ ID NO:1395), wherein said first amino acidsequence, bridging amino acid, second amino acid sequence and thirdamino acid sequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of HSSTROL3_P5 (SEQID NO:1395), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence ELGFPSSTGRDESLEHCRCQGLHK (SEQ ID NO: 252) in HSSTROL3_P5 (SEQID NO:1395).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for HSSTROL3_P7 (SEQID NO:1396), comprising a first amino acid sequence being at least 90%homologous toMAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQPWHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVLSGGRWEKTDLTYRILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHEGRADIMIDFARYW corresponding to amino acids 1-163 ofMM11_HUMAN (SEQ ID NO:1455), which also corresponds to amino acids 1-163of HSSTROL3_P7 (SEQ ID NO:1396), a bridging amino acid H correspondingto amino acid 164 of HSSTROL3_P7 (SEQ ID NO:1396), a second amino acidsequence being at least 90% homologous toGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWTIGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDCRGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLEPDAPPDACEASFDAVSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGLPSPVDAAFEDAQGHIWFFQG corresponding toamino acids 165-359 of MM11_HUMAN (SEQ ID NO:1455), which alsocorresponds to amino acids 165-359 of HSSTROL3_P7 (SEQ ID NO:1396), anda third amino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceTTGVSTPAPGV (SEQ ID NO: 253) corresponding to amino acids 360-370 ofHSSTROL3_P7 (SEQ ID NO:1396), wherein said first amino acid sequence,bridging amino acid, second amino acid sequence and third amino acidsequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of HSSTROL3_P7 (SEQID NO:1396), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence TTGVSTPAPGV (SEQ ID NO: 253) in HSSTROL3_P7 (SEQ ID NO:1396).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for HSSTROL3_P8 (SEQID NO:1397), comprising a first amino acid sequence being at least 90%homologous toMAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQPWHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVLSGGRWEKTDLTYRILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHEGRADIMIDFARYW corresponding to amino acids 1-163 ofMM11_HUMAN (SEQ ID NO:1455), which also corresponds to amino acids 1-163of HSSTROL3_P8 (SEQ ID NO:1397), a bridging amino acid H correspondingto amino acid 164 of HSSTROL3_P8 (SEQ ID NO:1397), a second amino acidsequence being at least 90% homologous toGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWTIGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDCRGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLE corresponding toamino acids 165-286 of MM11_HUMAN (SEQ ID NO:1455), which alsocorresponds to amino acids 165-286 of HSSTROL3_P8 (SEQ ID NO:1397), anda third amino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceVRPCLPVPLLLCWPL (SEQ ID NO: 254) corresponding to amino acids 287-301 ofHSSTROL3_P8 (SEQ ID NO:1397), wherein said first amino acid sequence,bridging amino acid, second amino acid sequence and third amino acidsequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of HSSTROL3_P8 (SEQID NO:1397), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence VRPCLPVPLLLCWPL (SEQ ID NO: 254) in HSSTROL3_P8 (SEQ IDNO:1397).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for HSSTROL3_P9 (SEQID NO:1398), comprising a first amino acid sequence being at least 90%homologous toMAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQPWHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQK corresponding to amino acids 1-96 ofMM11_HUMAN (SEQ ID NO:1455), which also corresponds to amino acids 1-96of HSSTROL3_P9 (SEQ ID NO:1398), a second amino acid sequence being atleast 90% homologous toRILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHEGRADIMIDFARYW corresponding toamino acids 113-163 of MM11_HUMAN (SEQ ID NO:1455), which alsocorresponds to amino acids 97-147 of HSSTROL3_P9 (SEQ ID NO:1398), abridging amino acid H corresponding to amino acid 148 of HSSTROL3_P9(SEQ ID NO:1398), a third amino acid sequence being at least 90%homologous toGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWTIGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDCRGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLEPDAPPDACEASFDAVSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGLPSPVDAAFEDAQGHIWFFQG corresponding toamino acids 165-359 of MM11_HUMAN (SEQ ID NO:1455), which alsocorresponds to amino acids 149-343 of HSSTROL3_P9 (SEQ ID NO:1398), anda fourth amino acid sequence being at least 70%, optionally at least80%, preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceTTGVSTPAPGV (SEQ ID NO: 253) corresponding to amino acids 344-354 ofHSSTROL3_P9 (SEQ ID NO:1398), wherein said first amino acid sequence,second amino acid sequence, bridging amino acid, third amino acidsequence and fourth amino acid sequence are contiguous and in asequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for an edge portionof HSSTROL3_P9 (SEQ ID NO:1398), comprising a polypeptide having alength “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise KR, having a structureas follows: a sequence starting from any of amino acid numbers 96−x to96; and ending at any of amino acid numbers 97+((n−2)−x), in which xvaries from 0 to n−2.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of HSSTROL3_P9 (SEQID NO:1398), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence TTGVSTPAPGV (SEQ ID NO: 253) in HSSTROL3_P9 (SEQ ID NO:1398).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for HUMCA1XIA_P14(SEQ ID NO:1372), comprising a first amino acid sequence being at least90% homologous toMEPWSSRWKTKRWLWDFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNSPEGISKTTGFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFSILFTVKPKKGIQSFLLSIYNEHGIQQIGVEVGRSPVFLFEDHTGKPAPEDYPLFRTVNIADGKWHRVAISVEKKTVTMIVDCKKKTTKPLDRSERAIVDTNGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEHYSPDCDSSAPKAAQAQEPQIDEYAPEDIIEYDYEYGEAEYKEAESVTEGPTVTEETIAQTEANIVDDFQEYNYGTMESYQTEAPRHVSGTNEPNPVEEIFTEEYLTGEDYDSQRKNSEDTLYENKEIDGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEEFGPGVPAETDITETSINGHGAYGEKGQKGEPAVVEPGMLVEGPPGPAGPAGIMGPPGLQGPTGPPGDPGDRGPPGRPGLPGADGLPGPPGTMLMLPFRYGGDGSKGPTISAQEAQAQAILQQARIALRGPPGPMGLTGRPGPVGGPGSSGAKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMPGEPGAKGDRGFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEAGPRGLLGPRGTPGAPGQPGMAGVDGPPGPKGNMGPQGEPGPPGQQGNPGPQGLPGPQGPIGPPGEKGPQGKPGLAGLPGADGPPGHPGKEGQSGEKGALGPPGPQGPIGYPGPRGVKGADGVRGLKGSKGEKGEDGFPGFKGDMGLKGDRGEVGQIGPRGEDGPEGPKGRAGPTGDPGPSGQAGEKGKLGVPGLPGYPGRQGPKGSTGFPGFPGANGEKGARGVAGKPGPRGQRGPTGPRGSRGARGPTGKPGPKGTSGGDGPPGPPGERGPQGPQGPVGFPGPKGPPGPPGKDGLPGHPGQRGETGFQGKTGPPGPGGVVGPQGPTGETGPIGERGHPGPPGPPGEQGLPGAAGKEGAKGDPGPQGISGKDGPAGLRGFPGERGLPGAQGAPGLKGGEGPQGPPGP Vcorresponding to amino acids 1-1056 of CA1B_HUMAN_V5 (SEQ ID NO:1447),which also corresponds to amino acids 1-1056 of HUMCA1XIA_P14 (SEQ IDNO:1372), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence VSMMIINSQTIMVVNYSSSFITLML (SEQ ID NO: 256)corresponding to amino acids 1057-1081 of HUMCA1XIA_P14 (SEQ IDNO:1372), wherein said first amino acid sequence and second amino acidsequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of HUMCA1XIA_P14(SEQ ID NO:1372), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence VSMMIINSQTIMVVNYSSSFITLML (SEQ ID NO: 256) inHUMCA1XIA_P14 (SEQ ID NO:1372).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for HUMCA1XIA_P15(SEQ ID NO:1373), comprising a first amino acid sequence being at least90% homologous toMEPWSSRWKTKRWLWDFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNSPEGISKTTGFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFSILFTVKPKKGIQSFLLSIYNEHGIQQIGVEVGRSPVFLFEDHTGKPAPEDYPLFRTVNIADGKWHRVAISVEKKTVTMIVDCKKKTTKPLDRSERAIVDTNGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEHYSPDCDSSAPKAAQAQEPQIDEYAPEDIIEYDYEYGEAEYICEAESVTEGPTVTEETIAQTEANIVDDFQEYNYGTMESYQTEAPRHVSGTNEPNPVEEIFTEEYLTGEDYDSQRKNSEDTLYENKEIDGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEEFGPGVPAETDITETSINGHGAYGEKGQKGEPAVVEPGMLVEGPPGPAGPAGIMGPPGLQGPTGPPGDPGDRGPPGRPGLPGADGLPGTMLMLPFRYGGDGSKGPTISAQEAQAQAILQQARIALRGPPGPMGLTGRPGPVGGPGSSGAKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMPGEPGAKGDRGFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEAGPRGLLGPRGTPGAPGQPGMAGVDGPPGPKGNMGPQGEPGPPGQQGNPGPQGLPGPQGPIGPPGEK corresponding to amino acids 1-714 of CA1B_HUMAN (SEQ IDNO:1446), which also corresponds to amino acids 1-714 of HUMCA1XIA_P15(SEQ ID NO:1373), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence MCCNLSFGILIPLQK (SEQ ID NO: 257) corresponding toamino acids 715-729 of HUMCA1XIA_P15 (SEQ ID NO:1373), wherein saidfirst amino acid sequence and second amino acid sequence are contiguousand in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of HUMCA1XIA_P15(SEQ ID NO:1373), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence MCCNLSFGILIPLQK (SEQ ID NO: 257) inHUMCA1XIA_P15 (SEQ ID NO:1373).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for HUMCA1XIA_P16(SEQ ID NO:1374), comprising a first amino acid sequence being at least90% homologous toMEPWSSRWKTKRWLWDFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNSPEGISKTTGFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFSILFTVKPKKGIQSFLLSIYNEHGIQQIGVEVGRSPVFLFEDHTGKPAPEDYPLFRTVNIADGKWHRVAISVEKKTVTMIVDCKKKTTKPLDRSERAIVDTNGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEHYSPDCDSSAPKAAQAQEPQIDEYAPEDIIEYDYEYGEAEYKEAESVTEGPTVTEETIAQTEANIVDDFQEYNYGTMESYQTEAPRHVSGTNEPNPVEEIFTEEYLTGEDYDSQRKNSEDTLYENKEIDGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEEFGPGVPAETDITETSINGHGAYGEKGQKGEPAVVEPGMLVEGPPGPAGPAGIMGPPGLQGPTGPPGDPGDRGPPGRPGLPGADGLPGPPGTMLMLPFRYGGDGSKGPTISAQEAQAQAILQQARIALRGPPGPMGLTGRPGPVGGPGSSGAKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMPGEPGAKGDRGFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEA corresponding to amino acids 1-648 of CA1B_HUMAN (SEQ IDNO:1446), which also corresponds to amino acids 1-648 of HUMCA1XIA_P16(SEQ ID NO:1374), a second amino acid sequence being at least 90%homologous to GMAGVDGPPGPKGNMGPQGEPGPPGQQGNPGPQGLPGPQGPIGPPGEKcorresponding to amino acids 667-714 of CA1B_HUMAN (SEQ ID NO:1446),which also corresponds to amino acids 649-696 of HUMCA1XIA_P16 (SEQ IDNO:1374), and a third amino acid sequence being at least 70%, optionallyat least 80%, preferably at least 85%, more preferably at least 90% andmost preferably at least 95% homologous to a polypeptide having thesequence VSFSFSLFYKKVIKFACDKRFVGRHDERKVVKLSLPLYLIYE (SEQ ID NO: 258)corresponding to amino acids 697-738 of HUMCA1XIA_P16 (SEQ ID NO:1374),wherein said first amino acid sequence, second amino acid sequence andthird amino acid sequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for an edge portionof HUMCA1XIA_P16 (SEQ ID NO:1374), comprising a polypeptide having alength “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise AG, having a structureas follows: a sequence starting from any of amino acid numbers 648−x to648; and ending at any of amino acid numbers 649+((n−2)−x), in which xvaries from 0 to n−2.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of HUMCA1XIA_P16(SEQ ID NO:1374), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence VSFSFSLFYKKVIKFACDKRFVGRHDERKVVKLSLPLYLIYE(SEQ ID NO: 258) in HUMCA1XIA_P16 (SEQ ID NO:1374).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for HUMCA1XIA_P17(SEQ ID NO:1375), comprising a first amino acid sequence being at least90% homologous toMEPWSSRWKTKRWLWDFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNSPEGISKTTGFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFSILFTVKPKKGIQSFLLSIYNEHGIQQIGVEVGRSPVFLFEDHTGKPAPEDYPLFRTVNIADGKWHRVAISVEKKTVTMIVDCKKKTTKPLDRSERAIVDTNGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEHYSPDCDSSAPKAAQAQEPQIDE corresponding to aminoacids 1-260 of CA1B_HUMAN (SEQ ID NO:1446), which also corresponds toamino acids 1-260 of HUMCA1XIA_P17 (SEQ ID NO:1375), and a second aminoacid sequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence VRSTRPEKVFVFQ (SEQ IDNO: 259) corresponding to amino acids 261-273 of HUMCA1XIA_P17 (SEQ IDNO:1375), wherein said first amino acid sequence and second amino acidsequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of HUMCA1XIA_P17(SEQ ID NO:1375), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence VRSTRPEKVFVFQ in HUMCA1XIA_P17 (SEQ IDNO:1375).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for R20779_P2 (SEQ IDNO:1402), comprising a first amino acid sequence being at least 90%homologous toMCAERLGQFMTLALVLATFDPARGTDATNPPEGPQDRSSQQKGRLSLQNTAEIQHCLVNAGDVGCGVFECFENNSCEIRGLHGICMTFLHNAGKFDAQGKSFIKDALKCKAHALRHRFGCISRKCPAIREMVSQLQRECYLKHDLCAAAQENTRVIVEMIHFKDLLLHE corresponding to amino acids 1-169 ofSTC2_HUMAN (SEQ ID NO:1458), which also corresponds to amino acids 1-169of R20779_P2 (SEQ ID NO:1402), and a second amino acid sequence being atleast 70%, optionally at least 80%, preferably at least 85%, morepreferably at least 90% and most preferably at least 95% homologous to apolypeptide having the sequence CYKIEITMPKRRKVKLRD (SEQ ID NO: 260)corresponding to amino acids 170-187 of R20779_P2 (SEQ ID NO:1402),wherein said first amino acid sequence and second amino acid sequenceare contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of R20779_P2 (SEQID NO:1402), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence CYKIEITMPKRRKVKLRD (SEQ ID NO: 260) in R20779_P2 (SEQ IDNO:1402).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forHUMOSTRO_PEA_(—)1_PEA_(—)1_P21 (SEQ ID NO:1627), comprising a firstamino acid sequence being at least 90% homologous toMRIAVICFCLLGITCAIPVKQADSGSSEEKQLYNKYPDAVATWLNPDPSQKQNLLAPQ correspondingto amino acids 1-58 of OSTP_HUMAN (SEQ ID NO:1462), which alsocorresponds to amino acids 1-58 of HUMOSTRO_PEA_(—)1_PEA_(—)1_P21 (SEQID NO:1627), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence VFLNFS (SEQ ID NO: 261) corresponding to amino acids59-64 of HUMOSTRO_PEA_(—)1_PEA_(—)1_P21 (SEQ ID NO:1627), wherein saidfirst amino acid sequence and second amino acid sequence are contiguousand in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofHUMOSTRO_PEA_(—)1_PEA_(—)1_P21 (SEQ ID NO:1627), comprising apolypeptide being at least 70%, optionally at least about 80%,preferably at least about 85%, more preferably at least about 90% andmost preferably at least about 95% homologous to the sequence VFLNFS(SEQ ID NO: 261) in HUMOSTRO_PEA_(—)1_PEA_(—)1_P21 (SEQ ID NO:1627).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forHUMOSTRO_PEA_(—)1_PEA_(—)1_P25 (SEQ ID NO:1628), comprising a firstamino acid sequence being at least 90% homologous toMRIAVICFCLLGITCAIPVKQADSGSSEEKQ corresponding to amino acids 1-31 ofOSTP_HUMAN (SEQ ID NO:1462), which also corresponds to amino acids 1-31of HUMOSTRO_PEA_(—)1_PEA_(—)1_P25 (SEQ ID NO:1628), and a second aminoacid sequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence H corresponding to aminoacids 32-32 of HUMOSTRO_PEA_(—)1_PEA_(—)1_P25 (SEQ ID NO:1628), whereinsaid first amino acid sequence and second amino acid sequence arecontiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forHUMOSTRO_PEA_(—)1_PEA_(—)1_P30 (SEQ ID NO:1629), comprising a firstamino acid sequence being at least 90% homologous toMRIAVICFCLLGITCAIPVKQADSGSSEEKQ corresponding to amino acids 1-31 ofOSTP_HUMAN (SEQ ID NO:1462), which also corresponds to amino acids 1-31of HUMOSTRO_PEA_(—)1_PEA_(—)1_P30 (SEQ ID NO:1629), and a second aminoacid sequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence VSIFYVFI (SEQ ID NO:262) corresponding to amino acids 32-39 ofHUMOSTRO_PEA_(—)1_PEA_(—)1_P30 (SEQ ID NO:1629), wherein said firstamino acid sequence and second amino acid sequence are contiguous and ina sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofHUMOSTRO_PEA_(—)1_PEA_(—)1_P30 (SEQ ID NO:1629), comprising apolypeptide being at least 70%, optionally at least about 80%,preferably at least about 85%, more preferably at least about 90% andmost preferably at least about 95% homologous to the sequence VSIFYVFI(SEQ ID NO: 262) in HUMOSTRO_PEA_(—)1_PEA_(—)1_P30 (SEQ ID NO:1629).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forHUMPHOSLIP_PEA_(—)2_P10 (SEQ ID NO:1327), comprising a first amino acidsequence being at least 90% homologous toMALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGHFYYNISEcorresponding to amino acids 1-67 of PLTP_HUMAN (SEQ ID NO:1433), whichalso corresponds to amino acids 1-67 of HUMPHOSLIP_PEA_(—)2_P10 (SEQ IDNO:1327), and a second amino acid sequence being at least 90% homologoustoKVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLLDTVPVRSSVDELVGIDYSLMKDPVASTSNLDMDFRGAFFPLTERNWSLPNRAVEPQLQEEERMVYVAFSEFFFDSAMESYFRAGALQLLLVGDKVPHDLDMLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKPSGTTISVTASVTIALVPPDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFRIYSNHSALESLALIPLQAPLKTMLQIGVMPMLNERTWRGVQIPLPEGINFVHEVVTNHAGFLTIGADLHFAKGLREVIEKNRPADVRASTAPTPSTAAV corresponding to aminoacids 163-493 of PLTP_HUMAN (SEQ ID NO:1433), which also corresponds toamino acids 68-398 of HUMPHOSLIP_PEA_(—)2_P10 (SEQ ID NO:1327), whereinsaid first amino acid sequence and second amino acid sequence arecontiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for an edge portionof HUMPHOSLIP_PEA_(—)2_P10 (SEQ ID NO:1327), comprising a polypeptidehaving a length “n”, wherein n is at least about 10 amino acids inlength, optionally at least about 20 amino acids in length, preferablyat least about 30 amino acids in length, more preferably at least about40 amino acids in length and most preferably at least about 50 aminoacids in length, wherein at least two amino acids comprise EK, having astructure as follows: a sequence starting from any of amino acid numbers67−x to 67; and ending at any of amino acid numbers 68+((n−2)−x), inwhich x varies from 0 to n−2.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forHUMPHOSLIP_PEA_(—)2_P12 (SEQ ID NO:1328), comprising a first amino acidsequence being at least 90% homologous toMALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASVSRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLLDTVPVRSSVDELVGIDYSLMKDPVASTSNLDMDFRGAFFPLTERNWSLPNRAVEPQLQEEERMVYVAFSEFFFDSAMESYFRAGALQLLLVGDKVPHDLDMLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKPSGTTISVTASVTIALVPPDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFRIYSNHSALESLALIPLQAPLKTMLQIGVMPMLNcorresponding to amino acids 1-427 of PLTP_HUMAN (SEQ ID NO:1433), whichalso corresponds to amino acids 1-427 of HUMPHOSLIP_PEA_(—)2_P12 (SEQ IDNO:1328), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence GKAGV (SEQ ID NO: 263) corresponding to amino acids428-432 of HUMPHOSLIP_PEA_(—)2_P12 (SEQ ID NO:1328), wherein said firstamino acid sequence and second amino acid sequence are contiguous and ina sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofHUMPHOSLIP_PEA_(—)2_P12 (SEQ ID NO:1328), comprising a polypeptide beingat least 70%, optionally at least about 80%, preferably at least about85%, more preferably at least about 90% and most preferably at leastabout 95% homologous to the sequence GKAGV (SEQ ID NO: 263) inHUMPHOSLIP_PEA_(—)2_P12 (SEQ ID NO:1328).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forHUMPHOSLIP_PEA_(—)2_P31 (SEQ ID NO:1330), comprising a first amino acidsequence being at least 90% homologous toMALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGHFYYNISEcorresponding to amino acids 1-67 of PLTP_HUMAN (SEQ ID NO:1433), whichalso corresponds to amino acids 1-67 of HUMPHOSLIP_PEA_(—)2_P31 (SEQ IDNO:1330), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence PGLERGADKFPVVGGSSLFLALDLTLRPPVG (SEQ ID NO: 264)corresponding to amino acids 68-98 of HUMPHOSLIP_PEA_(—)2_P31 (SEQ IDNO:1330), wherein said first amino acid sequence and second amino acidsequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofHUMPHOSLIP_PEA_(—)2_P31 (SEQ ID NO:1330), comprising a polypeptide beingat least 70%, optionally at least about 80%, preferably at least about85%, more preferably at least about 90% and most preferably at leastabout 95% homologous to the sequence PGLERGADKFPVVGGSSLFLALDLTLRPPVG(SEQ ID NO: 264) in HUMPHOSLIP_PEA_(—)2_P31 (SEQ ID NO:1330).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forHUMPHOSLIP_PEA_(—)2_P33 (SEQ ID NO:1331), comprising a first amino acidsequence being at least 90% homologous toMALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASVSRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQ corresponding to amino acids1-183 of PLTP_HUMAN (SEQ ID NO:1433), which also corresponds to aminoacids 1-183 of HUMPHOSLIP_PEA_(—)2_P33 (SEQ ID NO:1331), and a secondamino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceVWAATGRRVARVGMLSL (SEQ ID NO: 265) corresponding to amino acids 184-200of HUMPHOSLIP_PEA_(—)2_P33 (SEQ ID NO:1331), wherein said first aminoacid sequence and second amino acid sequence are contiguous and in asequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofHUMPHOSLIP_PEA_(—)2_P33 (SEQ ID NO:1331), comprising a polypeptide beingat least 70%, optionally at least about 80%, preferably at least about85%, more preferably at least about 90% and most preferably at leastabout 95% homologous to the sequence VWAATGRRVARVGMLSL (SEQ ID NO: 265)in HUMPHOSLIP_PEA_(—)2_P33 (SEQ ID NO:1331).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forHUMPHOSLIP_PEA_(—)2_P34 (SEQ ID NO:1332), comprising a first amino acidsequence being at least 90% homologous toMALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASVSRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLLDTVPVcorresponding to amino acids 1-205 of PLTP_HUMAN (SEQ ID NO:1433), whichalso corresponds to amino acids 1-205 of HUMPHOSLIP_PEA_(—)2_P34 (SEQ IDNO:1332), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence LWTSLLALTIPS (SEQ ID NO: 266) corresponding to aminoacids 206-217 of HUMPHOSLIP_PEA_(—)2_P34 (SEQ ID NO:1332), wherein saidfirst amino acid sequence and second amino acid sequence are contiguousand in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofHUMPHOSLIP_PEA_(—)2_P34 (SEQ ID NO:1332), comprising a polypeptide beingat least 70%, optionally at least about 80%, preferably at least about85%, more preferably at least about 90% and most preferably at leastabout 95% homologous to the sequence LWTSLLALTIPS (SEQ ID NO: 266) inHUMPHOSLIP_PEA_(—)2_P34 (SEQ ID NO:1332).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forHUMPHOSLIP_PEA_(—)2_P35 (SEQ ID NO:1333), comprising a first amino acidsequence being at least 90% homologous toMALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWF corresponding to amino acids 1-109of PLTP_HUMAN (SEQ ID NO:1433), which also corresponds to amino acids1-109 of HUMPHOSLIP_PEA_(—)2_P35 (SEQ ID NO:1333), a second amino acidsequence bridging amino acid sequence comprising of L, a third aminoacid sequence being at least 90% homologous to KVYDFLSTFITSGMRFLLNQQcorresponding to amino acids 163-183 of PLTP_HUMAN (SEQ ID NO:1433),which also corresponds to amino acids 111-131 of HUMPHOSLIP_PEA_(—)2_P35(SEQ ID NO:1333), and a fourth amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence VWAATGRRVARVGMLSL (SEQ ID NO: 265) corresponding toamino acids 132-148 of HUMPHOSLIP_PEA_(—)2_P35 (SEQ ID NO:1333), whereinsaid first amino acid sequence, second amino acid sequence, third aminoacid sequence and fourth amino acid sequence are contiguous and in asequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for an edge portion ofHUMPHOSLIP_PEA_(—)2_P35 (SEQ ID NO:1333), comprising a polypeptidehaving a length “n”, wherein n is at least about 10 amino acids inlength, optionally at least about 20 amino acids in length, preferablyat least about 30 amino acids in length, more preferably at least about40 amino acids in length and most preferably at least about 50 aminoacids in length, wherein at least two amino acids comprise FLK having astructure as follows (numbering according to HUMPHOSLIP_PEA_(—)2_P35(SEQ ID NO:1333)): a sequence starting from any of amino acid numbers109−x to 109; and ending at any of amino acid numbers 111+((n−2)−x), inwhich x varies from 0 to n−2.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofHUMPHOSLIP_PEA_(—)2_P35 (SEQ ID NO:1333), comprising a polypeptide beingat least 70%, optionally at least about 80%, preferably at least about85%, more preferably at least about 90% and most preferably at leastabout 95% homologous to the sequence VWAATGRRVARVGMLSL (SEQ ID NO: 265)in HUMPHOSLIP_PEA_(—)2_P35 (SEQ ID NO:1333).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR38144_PEA_(—)2_P6 (SEQ ID NO:1403), comprising a first amino acidsequence being at least 90% homologous toMPFRLLIPLGLLCALLPQHHGAPGPDGSAPDPAHYRERVKAMFYHAYDSYLENAFPFDELRPLTCDGHDTWGSFSLTLIDALDTLLILGNVSEFQRVVEVLQDSVDFDIDVNASVFETNIRVVGGLLSAHLLSKKAGVEVEAGWPCSGPLLRMAEEAARKLLPAFQTPTGMPYGTVNLLHGVNPGETPVTCTAGIGTFIVEFATLSSLTGDPVFEDVARVALMRLWESRSDIGLVGNHIDVLTGKWVAQDAGIGAGVDSYFEYLVKGAILLQDKKLMAMFLEYNKAIRNYTRFDDWYLWVQMYKGTVSMPVFQSLEAYWPGLQSLIGDIDNAMRTFLNYYTVWKQFGGLPEFYNIPQGYTVEKREGYPLRPELIESAMYLYRATGDPTLLELGRDAVESIEKISKVECGFATcorresponding to amino acids 1-412 of CT31_HUMAN (SEQ ID NO:1459), whichalso corresponds to amino acids 1-412 of R38144_PEA_(—)2_P6 (SEQ IDNO:1403), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence LASFSHMSDQRSARPQAGQPHGVVLPGRDCEIPLPPV (SEQ ID NO:268) corresponding to amino acids 413-449 of R38144_PEA_(—)2_P6 (SEQ IDNO:1403), wherein said first amino acid sequence and second amino acidsequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR38144_PEA_(—)2_P6 (SEQ ID NO:1403), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence LASFSHMSDQRSARPQAGQPHGVVLPGRDCEIPLPPV(SEQ ID NO: 268) in R38144_PEA_(—)2_P6 (SEQ ID NO:1403).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR38144_PEA_(—)2_P13 (SEQ ID NO:1404), comprising a first amino acidsequence being at least 90% homologous toMPFRLLIPLGLLCALLPQHHGAPGPDGSAPDPAHYRERVKAMFYHAYDSYLENAFPFDELRPLTCDGHDTWGSFSLTLIDALDTLLILGNVSEFQRVVEVLQDSVDFDIDVNASVFETNIRVVGGLLSAHLLSKKAGVEVEAGWPCSGPLLRMAEEAARKLLPAFQTPTGMPYGTVNLLHGVNPGETPVTCTAGIGTFIVEFATLSSLTGDPVFEDVARVALMRLWESRSDIGLVGNHIDVLTGKWVAQDAGIGAGVDSYFEYLVKGAILLQDKKLMAMFLEYNKAIRNYTRFDDWYLWVQMYKGTVSMPVFQSLEAYWPGLQ corresponding to amino acids1-323 of CT31_HUMAN (SEQ ID NO:1459), which also corresponds to aminoacids 1-323 of R38144_PEA_(—)2_P13 (SEQ ID NO:1404), and a second aminoacid sequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence NLLKAQCTSTVPRGIPPS (SEQID NO: 269) corresponding to amino acids 324-341 of R38144_PEA_(—)2_P13(SEQ ID NO:1404), wherein said first amino acid sequence and secondamino acid sequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR38144_PEA_(—)2_P13 (SEQ ID NO:1404), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence NLLKAQCTSTVPRGIPPS (SEQ ID NO: 269) inR38144_PEA_(—)2_P13 (SEQ ID NO:1404).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR38144_PEA_(—)2_P15 (SEQ ID NO:1405), comprising a first amino acidsequence being at least 90% homologous toMPFRLLIPLGLLCALLPQHHGAPGPDGSAPDPAHYRERVKAMFYHAYDSYLENAFPFDELRPLTCDGHDTWGSFSLTLIDALDTLLILGNVSEFQRVVEVLQDSVDFDIDVNASVFETNIRVVGGLLSAHLLSKKAGVEVEAGWPCSGPLLRMAEEAARKLLPAFQTPTGMPYGTVNLLHGVNPGETPVTCTAGIGTFIVEFATLSSLTGDPVFEDVARVALMRLWESRSDIGLVGNHIDVLTGKWVAQDAGIGAGVDSYFEYLVKGAILLQDKKLMAMF LEcorresponding to amino acids 1-282 of CT31_HUMAN (SEQ ID NO:1459), whichalso corresponds to amino acids 1-282 of R38144_PEA_(—)2_P15 (SEQ IDNO:1405), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence PHWRH (SEQ ID NO: 270) corresponding to amino acids283-287 of R38144_PEA_(—)2_P15 (SEQ ID NO:1405), wherein said firstamino acid sequence and second amino acids sequence are contiguous andin a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR38144_PEA_(—)2_P15 (SEQ ID NO:1405), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence PHWRH (SEQ ID NO: 270) inR38144_PEA_(—)2_P15 (SEQ ID NO:1405).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR38144_PEA_(—)2_P19 (SEQ ID NO:1406), comprising a first amino acidsequence being at least 90% homologous toMPFRLLIPLGLLCALLPQHHGAPGPDGSAPDPAHYRERVKAMFYHAYDSYLENAFPFDELRPLTCDGHDTWGSFSLTLIDALDTLLILGNVSEFQRVVEVLQDSVDFDIDVNASVFETNIRVVGGLLSAHLLSKKAGVEVEAGWPCSGPLLRMAEEAARKLLPAFQTPTGMPYGTVNLLHGVNPGETPVTCTAGIGTFIVEFATLSSLTGDPVFEDVARVALMRLWESRSDIGLVGNHIDVLTGKWVAQDAGIGAGVDSYFEYLVKGAILLQDKKLMAMFLEYNKAIRNYTRFDDWYLWVQMYKGTVSMPVFQSLEAYWPGLQSLIGDIDNAMRTFLNYYTVWKQFGGLPEFYNIPQGYTVEKREGYPLRPELIESAMYLYRATGDPTLLELGRDAVESIEKISKVECGFATcorresponding to amino acids 1-412 of CT31_HUMAN (SEQ ID NO:1459), whichalso corresponds to amino acids 1-412 of R38144_PEA_(—)2_P19 (SEQ IDNO:1406), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence KRSRSVAQAGVQWCDHDSPQP (SEQ ID NO: 270) correspondingto amino acids 413-433 of R38144_PEA_(—)2_P19 (SEQ ID NO:1406), whereinsaid first amino acid sequence and second amino acid sequence arecontiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR38144_PEA_(—)2_P19 (SEQ ID NO:1406), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence KRSRSVAQAGVQWCDHDSPQP (SEQ ID NO: 270) inR38144_PEA_(—)2_P19 (SEQ ID NO:1406).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR38144_PEA_(—)2_P24 (SEQ ID NO:1407), comprising a first amino acidsequence being at least 90% homologous toMPFRLLIPLGLLCALLPQHHGAPGPDGSAPDPAHYRERVKAMFYHAYDSYLENAFPFDELRPLTCDGHDTWGSFSLTLIDALDTLLILGNVSEFQRVVEVLQDSVDFDIDVNASVFETNIR corresponding toamino acids 1-121 of CT31_HUMAN (SEQ ID NO:1459), which also correspondsto amino acids 1-121 of R38144_PEA_(—)2_P24 (SEQ ID NO:1407), and asecond amino acid sequence being at least 90% homologous toEYNKAIRNYTRFDDWYLWVQMYKGTVSMPVFQSLEAYWPGLQSLIGDIDNAMRTFLNYYTVWKQFGGLPEFYNIPQGYTVEKREGYPLRPELIESAMYLYRATGDPTLLELGRDAVESIEKISKVECGFATIKDLRDHKLDNRMESFFLAETVKYLYLLFDPTNFIHNNGSTFDAVITPYGECILGAGGYIFNTEAHPIDPAALHCCQRLKEEQWEVEDLMREFYSLKRSRSKFQKNTVSSGPWEPPARPGTLFSPENHDQARERKPAKQKVPLLSCPSQPFTSKLALLGQVFLDSS corresponding to amino acids 282-578 of CT31_HUMAN (SEQID NO:1459), which also corresponds to amino acids 122-418 ofR38144_PEA_(—)2_P24 (SEQ ID NO:1407), wherein said first amino acidsequence and second amino acid sequence are contiguous and in asequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for an edge portionof R38144_PEA_(—)2_P24 (SEQ ID NO:1407), comprising a polypeptide havinga length “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise RE, having a structureas follows: a sequence starting from any of amino acid numbers 121−x to121; and ending at any of amino acid numbers 122+((n−2)−x), in which xvaries from 0 to n−2.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR38144_PEA_(—)2_P36 (SEQ ID NO:1408), comprising a first amino acidsequence being at least 90% homologous toMPFRLLIPLGLLCALLPQHHGAPGPDGSAPDPAHYR corresponding to amino acids 1-36of AAH16184 (SEQ ID NO:1460), which also corresponds to amino acids 1-36of R38144_PEA_(—)2_P36 (SEQ ID NO:1408), and a second amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence FWGMSQNSKEWLKCSRTAWTLILM(SEQ ID NO: 272) corresponding to amino acids 37-60 ofR38144_PEA_(—)2_P36 (SEQ ID NO:1408), wherein said first amino acidsequence and second amino acid sequence are contiguous and in asequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR38144_PEA_(—)2_P36 (SEQ ID NO:1408), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence FWGMSQNSKEWLKCSRTAWTLILM (SEQ ID NO: 272)in R38144_PEA_(—)2_P36 (SEQ ID NO:1408).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR38144_PEA_(—)2_P36 (SEQ ID NO:1408), comprising a first amino acidsequence being at least 90% homologous toMPFRLLIPLGLLCALLPQHHGAPGPDGSAPDPAHY corresponding to amino acids 1-35 ofAAQ88943 (SEQ ID NO:1461), which also corresponds to amino acids 1-35 ofR38144_PEA_(—)2_P36 (SEQ ID NO:1408), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence RFWGMSQNSKEWLKCSRTAWTLILMcorresponding to amino acids 36-60 of R38144_PEA_(—)2_P36 (SEQ IDNO:1408), wherein said first amino acid sequence and second amino acidsequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR38144_PEA_(—)2_P36 (SEQ ID NO:1408), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence RFWGMSQNSKEWLKCSRTAWTLILM inR38144_PEA_(—)2_P36 (SEQ ID NO:1408).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR38144_PEA_(—)2_P36 (SEQ ID NO:1408), comprising a first amino acidsequence being at least 90% homologous toMPFRLLIPLGLLCALLPQHHGAPGPDGSAPDPAHYR corresponding to amino acids 1-36of CT31_HUMAN (SEQ ID NO:1459), which also corresponds to amino acids1-36 of R38144_PEA_(—)2_P36 (SEQ ID NO:1408), and a second amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence FWGMSQNSKEWLKCSRTAWTLILM(SEQ ID NO: 272) corresponding to amino acids 37-60 ofR38144_PEA_(—)2_P36 (SEQ ID NO:1408), wherein said first amino acidsequence and second amino acid sequence are contiguous and in asequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR38144_PEA_(—)2_P36 (SEQ ID NO:1408), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence FWGMSQNSKEWLKCSRTAWTLILM (SEQ ID NO: 272)in R38144_PEA_(—)2_P36 (SEQ ID NO:1408).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for AA161187_P6 (SEQID NO:1319), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence HTREGTLGGQKRAFPDGVEGEKGRGRAWGAASRGSAVPLTIR (SEQ IDNO: 273) corresponding to amino acids 1-42 of AA161187_P6 (SEQ IDNO:1319), and a second amino acid sequence being at least 90% homologousto GPCGRRVITSRIVGGEDAELGRWPWQGSLRLWDSHVCGVSLLSHRWALTAAHCFETYSDLSDPSGWMVQFGQLTSMPSFWSLQAYYTRYFVSNIYLSPRYLGNSPYDIALVKLSAPVTYTKHIQPICLQASTFEFENRTDCWVTGWGYIKEDEALPSPHTLQEVQVAIINNSMCNHLFLKYSFRKDIFGDMVCAGNAQGGKDACFGDSGGPLACNKNGLWYQIGVVSWGVGCGRPNRPGVYTNISHHFEWIQKLMAQSGMSQPDPSWPLLFFPLLWALPLLGPV corresponding to amino acids 31-314 of TEST_HUMAN (SEQ IDNO:1431), which also corresponds to amino acids 43-326 of AA161187_P6(SEQ ID NO:1319), wherein said first amino acid sequence and secondamino acid sequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a head of AA161187_P6 (SEQID NO:1319), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence HTREGTLGGQKRAFPDGVEGEKGRGRAWGAASRGSAVPLTIR (SEQ ID NO: 273) ofAA161187_P6 (SEQ ID NO:1319).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for AA161187_P13 (SEQID NO:1320), comprising a first amino acid sequence being at least 90%homologous toMGARGALLLALLLARAGLRKPESQEAAPLSGPCGRRVITSRIVGGEDAELGRWPWQGSLRLWDSHVCGVSLLSHRWALTAAHCFETYSDLSDPSGWMVQFGQLTSMPSFWSLQAYYTRYFVSNIYLSPRYLGNSPYDIALVKLSAPVTYTKHIQPICLQASTFEFENRTDCWVTGWGYIKEDE corresponding to aminoacids 1-183 of TEST_HUMAN (SEQ ID NO:1431), which also corresponds toamino acids 1-183 of AA161187_P13 (SEQ ID NO:1320), and a second aminoacid sequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequenceGSSGRHHKQLYVQPPLPQVQFPQGHLWRHG (SEQ ID NO: 274) corresponding to aminoacids 184-213 of AA161187_P13 (SEQ ID NO:1320), wherein said first aminoacid sequence and second amino acid sequence are contiguous and in asequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of AA161187_P13(SEQ ID NO:1320), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence GSSGRHHKQLYVQPPLPQVQFPQGHLWRHG (SEQ ID NO:274) in AA161187_P13 (SEQ ID NO:1320).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for AA161187_P14 (SEQID NO:1321), comprising a first amino acid sequence being at least 90%homologous toMGARGALLLALLLARAGLRKPESQEAAPLSGPCGRRVITSRIVGGEDAELGRWPWQGSLRLWDSHVCGVSLLSHRWALTAAHCFETYSDLSDPSGWMVQFGQLTSMPSFWSLQAYYTRYFVSNIYLSPRYLGNSPYDIALVKLSAPVTYTKHIQPICLQASTFEFENRTDCWVTGWGYIKEDE corresponding to aminoacids 1-183 of TEST_HUMAN (SEQ ID NO:1431), which also corresponds toamino acids 1-183 of AA161187_P14 (SEQ ID NO:1321), and a second aminoacid sequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequenceGCCLSPSHYRPHSTAISPHPPGSSGRHHKQLYVQPPLPQVQFPQGHLWRHGLCWQCPRREGCLLRECPCHHSQPRKASCVPVPYLTLMPTPGGGDCCPTLQMQKRRLGCCQGEEEDVHPVYPAP (SEQ ID NO: 275)corresponding to amino acids 184-307 of AA161187_P14 (SEQ ID NO:1321),wherein said first amino acid sequence and second amino acid sequenceare contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of AA161187_P14(SEQ ID NO:1321), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequenceGCCLSPSHYRPHSTAISPHPPGSSGRHHKQLYVQPPLPQVQFPQGHLWRHGLCWQCPRREGCLLRECPCHHSQPRKASCVPVPYLTLMPTPGGGDCCPTLQMQKRRLGCCQGEEEDVHPVYPAP (SEQ ID NO: 275)in AA161187_P14 (SEQ ID NO:1321).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for AA161187_P18 (SEQID NO:1322), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence HTREGTLGGQKRAFPDGVEGEKGRGRAWGAASRGSAVPLTIR (SEQ IDNO: 273) corresponding to amino acids 1-42 of AA161187_P18 (SEQ IDNO:1322), a second amino acid sequence being at least 90% homologous toGPCGRRVITSRIVGGEDAELGRWPWQGSLRLWDSHVCGVSLLSHRWALTAAHCFET correspondingto amino acids 31-86 of TEST_HUMAN (SEQ ID NO:1431), which alsocorresponds to amino acids 43-98 of AA161187_P18 (SEQ ID NO:1322), athird amino acid sequence being at least 90% homologous toDLSDPSGWMVQFGQLTSMPSFWSLQAYYTRYFVSNIYLSPRYLGNSPYDIALVKLSAPVTYTKHIQPICLQASTFEFENRTDCWVTGWGYIKEDEALPSPHTLQEVQVAIINNSMCNHLFLKYSFRKDIFGDMVCAGNAQGGKDACF corresponding to amino acids 89-235 of TEST_HUMAN (SEQ IDNO:1431), which also corresponds to amino acids 99-245 of AA161187_P18(SEQ ID NO:1322), and a fourth amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence VSVPATTPSPGKHPVSLCLI (SEQ ID NO: 277) correspondingto amino acids 246-265 of AA161187_P18 (SEQ ID NO:1322), wherein saidfirst amino acid sequence, second amino acid sequence, third amino acidsequence and fourth amino acid sequence are contiguous and in asequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a head of AA161187_P18(SEQ ID NO:1322), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence HTREGTLGGQKRAFPDGVEGEKGRGRAWGAASRGSAVPLTIR(SEQ ID NO: 273) of AA161187_P18 (SEQ ID NO:1322).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for an edge portionof AA161187_P18 (SEQ ID NO:1322), comprising a polypeptide having alength “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise TD, having a structureas follows: a sequence starting from any of amino acid numbers 98−x to99; and ending at any of amino acid numbers 99+((n−2)−x), in which xvaries from 0 to n−2.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of AA161187_P18(SEQ ID NO:1322), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence VSVPATTPSPGKHPVSLCLI (SEQ ID NO: 277) inAA161187_P18(SEQ ID NO:1322).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for AA161187_P19 (SEQID NO:1323), comprising a first amino acid sequence being at least 90%homologous toMGARGALLLALLLARAGLRKPESQEAAPLSGPCGRRVITSRIVGGEDAELGRWPWQGSLRLWDSHVCGVSLLSHRWALTAAHCFETYSDLSDPSGWMVQFGQLTSMPSFWSLQAYYTRYFVSNIYLSPRYLGNSPYDIALVKLSAPVTYTKHIQPICLQASTFEFENRTDCWVTGWGYIKEDE corresponding to aminoacids 1-183 of TEST_HUMAN (SEQ ID NO:1431), which also corresponds toamino acids 1-183 of AA161187_P19 (SEQ ID NO:1323), and a second aminoacid sequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence DKRTQ (SEQ ID NO: 278)corresponding to amino acids 184-188 of AA161187_P19 (SEQ ID NO:1323),wherein said first amino acid sequence and second amino acid sequenceare contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of AA161187_P19(SEQ ID NO:1323), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence DKRTQ (SEQ ID NO: 278) in AA161187_P19 (SEQID NO:1323).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forZ25299_PEA_(—)2_P2 (SEQ ID NO:1390), comprising a first amino acidsequence being at least 90% homologous toMKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCPGKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLNPPNFCEMDGQCKRDLKCCMGMCGKSCVSPVKcorresponding to amino acids 1-131 of ALK1_HUMAN (SEQ ID NO:1454), whichalso corresponds to amino acids 1-131 of Z25299_PEA_(—)2_P2 (SEQ IDNO:1390), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence GKQGMRAH (SEQ ID NO: 279) corresponding to aminoacids 132-139 of Z25299_PEA_(—)2_P2 (SEQ ID NO:1390), wherein said firstand second amino acid sequences are contiguous and in a sequentialorder.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofZ25299_PEA_(—)2_P2 (SEQ ID NO:1390), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence GKQGMRAH (SEQ ID NO: 279) inZ25299_PEA_(—)2_P2 (SEQ ID NO:1390).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forZ25299_PEA_(—)2_P3 (SEQ ID NO:1391), comprising a first amino acidsequence being at least 90% homologous toMKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCPGKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLNPPNFCEMDGQCKRDLKCCMGMCGKSCVSPVKcorresponding to amino acids 1-131 of ALK1_HUMAN (SEQ ID NO:1454), whichalso corresponds to amino acids 1-131 of Z25299_PEA_(—)2_P3 (SEQ IDNO:1391), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence GEKRHHKQLRDQEVDPLEMRRHSAG (SEQ ID NO: 269)corresponding to amino acids 132-156 of Z25299_PEA_(—)2_P3 (SEQ IDNO:1391), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofZ25299_PEA_(—)2_P3 (SEQ ID NO:1391), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence GEKRHHKQLRDQEVDPLEMRRHSAG (SEQ ID NO:269) in Z25299_PEA_(—)2_P3 (SEQ ID NO:1391).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forZ25299_PEA_(—)2_P7 (SEQ ID NO:1392), comprising a first amino acidsequence being at least 90% homologous toMKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCPGKKRCCPDTCGIKCLDPVDTPNP corresponding to amino acids 1-81 of ALK1_HUMAN (SEQ IDNO:1454), which also corresponds to amino acids 1-81 ofZ25299_PEA_(—)2_P7 (SEQ ID NO:1392), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence RGSLGSAQ (SEQ ID NO: 622)corresponding to amino acids 82-89 of Z25299_PEA_(—)2_P7 (SEQ IDNO:1392), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofZ25299_PEA_(—)2_P7 (SEQ ID NO:1392), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence RGSLGSAQ (SEQ ID NO: 622) inZ25299_PEA_(—)2_P7 (SEQ ID NO:1392).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forZ25299_PEA_(—)2_P10 (SEQ ID NO:1393), comprising a first amino acidsequence being at least 90% homologous toMKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCPGKKRCCPDTCGIKCLDPVDTPNPT corresponding to amino acids 1-82 of ALK1_HUMAN (SEQ IDNO:1454), which also corresponds to amino acids 1-82 ofZ25299_PEA_(—)2_P10 (SEQ ID NO:1393).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for R66178_P3 (SEQ IDNO:1324), comprising a first amino acid sequence being at least 90%homologous toMARMGLAGAAGRWWGLALGLTAFFLPGVHSQVVQVNDSMYGFIGTDVVLHCSFANPLPSVKITQVTWQKSTNGSKQNVAIYNPSMGVSVLAPYRERVEFLRPSFTDGTIRLSRLELEDEGVYICEFATFPTGNRESQLNLTVMAKPTNWIEGTQAVLRAKKGQDDKVLVATCTSANGKPPSVVSWETRLKGEAEYQEIRNPNGTVTVISRYRLVPSREAHQQSLACIVNYHMDRFKESLTLNVQYEPEVTIEGFDGNWYLQRMDVKLTCKADANPPATEYHWTTLNGSLPKGVEAQNRTLFFKGPINYSLAGTYICEATNPIGTRSGQVEVNIT correspondingto amino acids 1-334 of PVR1_HUMAN (SEQ ID NO:1432), which alsocorresponds to amino acids 1-334 of R66178_P3 (SEQ ID NO:1324), and asecond amino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceGEGHSLPISPGVLQTQNCGP (SEQ ID NO: 694) corresponding to amino acids335-354 of R66178_P3 (SEQ ID NO:1324), wherein said first amino acidsequence and second amino acid sequence are contiguous and in asequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of R66178_P3 (SEQID NO:1324), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence GEGHSLPISPGVLQTQNCGP (SEQ ID NO: 694) in R66178_P3 (SEQ IDNO:1324).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for R66178_P4 (SEQ IDNO:1325), comprising a first amino acid sequence being at least 90%homologous toMARMGLAGAAGRWWGLALGLTAFFLPGVHSQVVQVNDSMYGFIGTDVVLHCSFANPLPSVKITQVTWQKSTNGSKQNVAIYNPSMGVSVLAPYRERVEFLRPSFTDGTIRLSRLELEDEGVYICEFATFPTGNRESQLNLTVMAKPTNWIEGTQAVLRAKKGQDDKVLVATCTSANGKPPSVVSWETRLKGEAEYQEIRNPNGTVTVISRYRLVPSREAHQQSLACIVNYHMDRFKESLTLNVQYEPEVTIEGFDGNWYLQRMDVKLTCKADANPPATEYHWTTLNGSLPKGVEAQNRTLFFKGPINYSLAGTYICEATNPIGTRSGQVEVNIT correspondingto amino acids 1-334 of PVR1_HUMAN (SEQ ID NO:1432), which alsocorresponds to amino acids 1-334 of R66178_P4 (SEQ ID NO:1325), and asecond amino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceAFCQLIYPGKGRTRARMF (SEQ ID NO: 1702) corresponding to amino acids335-352 of R66178_P4 (SEQ ID NO:1325), wherein said first amino acidsequence and second amino acid sequence are contiguous and in asequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of R66178_P4 (SEQID NO:1325), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence AFCQLIYPGKGRTRARMF (SEQ ID NO: 1702) in R66178_P4 (SEQ IDNO:1325).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for R66178_P8 (SEQ IDNO:1326), comprising a first amino acid sequence being at least 90%homologous toMARMGLAGAAGRWWGLALGLTAFFLPGVHSQVVQVNDSMYGFIGTDVVLHCSFANPLPSVKITQVTWQKSTNGSKQNVAIYNPSMGVSVLAPYRERVEFLRPSFTDGTIRLSRLELEDEGVYICEFATFPTGNRESQLNLTVMAKPTNWIEGTQAVLRAKKGQDDKVLVATCTSANGKPPSVVSWETRLKGEAEYQEIRNPNGTVTVISRYRLVPSREAHQQSLACIVNYHMDRFKESLTLNVQYEPEVTIEGFDGNWYLQRMDVKLTCKADANPPATEYHWTTLNGSLPKGVEAQNRTLFFKGPINYSLAGTYICEATNPIGTRSGQVE corresponding toamino acids 1-330 of PVR1_HUMAN (SEQ ID NO:1432), which also correspondsto amino acids 1-330 of R66178_P8 (SEQ ID NO:1326), and a second aminoacid sequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequenceNSPTPRLLPNMGGAPGRCPRPSLGAWRGASCWC (SEQ ID NO: 1717) corresponding toamino acids 331-363 of R66178_P8 (SEQ ID NO:1326), wherein said firstamino acid sequence and second amino acid sequence are contiguous and ina sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of R66178_P8 (SEQID NO:1326), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence NSPTPRLLPNMGGAPGRCPRPSLGAWRGASCWC (SEQ ID NO: 1717) inR66178_P8 (SEQ ID NO:1326).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forHSU33147_PEA_(—)1_P5 (SEQ ID NO:1415), comprising a first amino acidsequence being at least 90% homologous toMKLLMVLMLAALSQHCYAGSGCPLLENVISKTINPQVSKTEYKELLQEFIDDNATTNAIDELKECFLNQTDETLSNVE corresponding to amino acids 1-78 of MGBA_HUMAN (SEQ IDNO:1416), which also corresponds to amino acids 1-78 ofHSU33147_PEA_(—)1_P5 (SEQ ID NO:1415), and a second amino acid sequencebeing at least 90% homologous to QLIYDSSLCDLF corresponding to aminoacids 82-93 of MGBA_HUMAN (SEQ ID NO:1416), which also corresponds toamino acids 79-90 of HSU33147_PEA_(—)1_P5 (SEQ ID NO:1415), wherein saidfirst amino acid sequence and second amino acid sequence are contiguousand in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for an edge portionof HSU33147_PEA_(—)1_P5 (SEQ ID NO:1415), comprising a polypeptidehaving a length “n”, wherein n is at least about 10 amino acids inlength, optionally at least about 20 amino acids in length, preferablyat least about 30 amino acids in length, more preferably at least about40 amino acids in length and most preferably at least about 50 aminoacids in length, wherein at least two amino acids comprise EQ, having astructure as follows: a sequence starting from any of amino acid numbers78−x to 78; and ending at any of amino acid numbers 79+((n−2)−x), inwhich x varies from 0 to n−2.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forHSU33147_PEA_(—)1_P5 (SEQ ID NO:1415), comprising a first amino acidsequence being at least 90% homologous toMKLLMVLMLAALSQHCYAGSGCPLLENVISKTINPQVSKTEYKELLQEFIDDNATTNAIDELKECFLNQTDETLSNVE corresponding to amino acids 1-78 of MGBA_HUMAN (SEQ IDNO:1416), which also corresponds to amino acids 1-78 ofHSU33147_PEA_(—)1_P5 (SEQ ID NO:1415), and a second amino acid sequencebeing at least 90% homologous to QLIYDSSLCDLF corresponding to aminoacids 82-93 of MGBA_HUMAN (SEQ ID NO:1416), which also corresponds toamino acids 79-90 of HSU33147_PEA_(—)1_P5 (SEQ ID NO:1415), wherein saidfirst amino acid sequence and second amino acid sequence are contiguousand in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for an edge portionof HSU33147_PEA_(—)1_P5 (SEQ ID NO:1415), comprising a polypeptidehaving a length “n”, wherein n is at least about 10 amino acids inlength, optionally at least about 20 amino acids in length, preferablyat least about 30 amino acids in length, more preferably at least about40 amino acids in length and most preferably at least about 50 aminoacids in length, wherein at least two amino acids comprise EQ, having astructure as follows: a sequence starting from any of amino acid numbers78−x to 78; and ending at any of amino acid numbers 79+((n−2)−x), inwhich x varies from 0 to n−2.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forM78076_PEA_(—)1_P3 (SEQ ID NO:1350), comprising a first amino acidsequence being at least 90% homologous toMGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPMERWCGGSRSGSCAHPHHQVVPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAVVGKVTPTPRPTDGVDIYFGMPGEISEHEGFLRAKMDLEERRMRQINEVMREWAMADNQSKNLPKADRQALNEHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVHTHLQVIEERVNQSLGLLDQNPHLAQELRPQIQELLHSEHLGPSELEAPAPGGSSEDKGGLQPPDSKD corresponding to amino acids 1-517 ofAPP1_HUMAN (SEQ ID NO:1439), which also corresponds to amino acids 1-517of M78076_PEA_(—)1_P3 (SEQ ID NO:1350), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence GE corresponding to amino acids518-519 of M78076_PEA_(—)1_P3 (SEQ ID NO:1350), wherein said first aminoacid sequence and second amino acid sequence are contiguous and in asequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forM78076_PEA_(—)1_P4 (SEQ ID NO:1351), comprising a first amino acidsequence being at least 90% homologous toMGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPMERWCGGSRSGSCAHPHHQVVPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAVVGKVTPTPRPTDGVDIYFGMPGEISEHEGFLRAKMDLEERRMRQINEVMREWAMADNQSKNLPKADRQALNEHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVHTHLQVIEERVNQSLGLLDQNPHLAQELRPQIQELLHSEHLGPSELEAPAPGGSSEDKGGLQPPDSKDDTPMTLPKG corresponding to amino acids 1-526 ofAPP1_HUMAN (SEQ ID NO:1439), which also corresponds to amino acids 1-526of M78076_PEA_(—)1_P4 (SEQ ID NO:1351), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence ECLTVNPSLQIPLNP (SEQ ID NO: 1718)corresponding to amino acids 527-541 of M78076_PEA_(—)1_P4 (SEQ IDNO:1351), wherein said first amino acid sequence and second amino acidsequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofM78076_PEA_(—)1_P4 (SEQ ID NO:1351), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence ECLTVNPSLQIPLNP (SEQ ID NO: 1718) inM78076_PEA_(—)1_P4 (SEQ ID NO:1351).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forM78076_PEA_(—)1_P12 (SEQ ID NO:1352), comprising a first amino acidsequence being at least 90% homologous toMGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPMERWCGGSRSGSCAHPHHQVVPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAVVGKVTPTPRPTDGVDIYFGMPGEISEHEGFLRAKMDLEERRMRQINEVMREWAMADNQSKNLPKADRQALNEHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVHTHLQVIEERVNQSLGLLDQNPHLAQELRPQIQELLHSEHLGPSELEAPAPGGSSEDKGGLQPPDSKDDTPMTLPKG corresponding to amino acids 1-526 ofAPP1_HUMAN (SEQ ID NO:1439), which also corresponds to amino acids 1-526of M78076_PEA_(—)1_P12 (SEQ ID NO:1352), and a second amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence ECVCSKGFPFPLIGDSEG (SEQID NO: 1719) corresponding to amino acids 527-544 of M78076_PEA_(—)1_P12(SEQ ID NO:1352), wherein said first amino acid sequence and secondamino acid sequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofM78076_PEA_(—)1_P12 (SEQ ID NO:1352), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence ECVCSKGFPFPLIGDSEG (SEQ ID NO:1719) inM78076_PEA_(—)1_P12 (SEQ ID NO:1352).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forM78076_PEA_(—)1_P14 (SEQ ID NO:1353), comprising a first amino acidsequence being at least 90% homologous toMGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPMERWCGGSRSGSCAHPHHQVVPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAVVGKVTPTPRPTDGVDIYFGMPGEISEHEGFLRAKMDLEERRMRQINEVMREWAMADNQSKNLPKADRQALNEHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVHTHLQVIEERVNQSLGLLDQNPHLAQELRPQIQELLHSEHLGPSELEAPAPGGSSEDKGGLQPPDSKDDTPMTLPKGSTEQDAASPEKEKMNPLEQYERKVNASVPRGFPFHSSEIQRDEL corresponding to amino acids 1-570 of APP1_HUMAN (SEQ IDNO:1439), which also corresponds to amino acids 1-570 ofM78076_PEA_(—)1_P14 (SEQ ID NO:1353), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequenceVRGGTAGYLGEETRGQRPGCDSQSHTGPSKKPSAPSPLPAGTSWDRGVP (SEQ ID NO: 1720)corresponding to amino acids 571-619 of M78076_PEA_(—)1_P14 (SEQ IDNO:1353), wherein said first amino acid sequence and second amino acidsequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofM78076_PEA_(—)1_P14 (SEQ ID NO:1353), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequenceVRGGTAGYLGEETRGQRPGCDSQSHTGPSKKPSAPSPLPAGTSWDRGVP (SEQ ID NO: 1720) inM78076_PEA_(—)1_P14 (SEQ ID NO:1353).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forM78076_PEA_(—)1_P21 (SEQ ID NO:1354), comprising a first amino acidsequence being at least 90% homologous toMGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPMERWCGGSRSGSCAHPHHQVVPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAVVGKVTPTPRPTDGVDIYFGMPGEISEHEGFLRAKMDLEERRMRQINEVMREWAMADNQSKNLPKADRQALNEcorresponding to amino acids 1-352 of APP1_HUMAN (SEQ ID NO:1439), whichalso corresponds to amino acids 1-352 of M78076_PEA_(—)1_P21 (SEQ IDNO:1354), and a second amino acid sequence being at least 90% homologousto AERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVHTHLQVIEERVNQSLGLLDQNPHLAQELRPQIQELLHSEHLGPSELEAPAPGGSSEDKGGLQPPDSKDDTPMTLPKGSTEQDAASPEKEKMNPLEQYERKVNASVPRGFPFHSSEIQRDELAPAGTGVSREAVSGLLIMGAGGGSLIVLSMLLLRRKKPYGAISHGVVEVDPMLTLEEQQLRELQRHGYENPTYRFLEERP corresponding to amino acids 406-650of APP1_HUMAN (SEQ ID NO:1439), which also corresponds to amino acids353-597 of M78076_PEA_(—)1_P21 (SEQ ID NO:1354), wherein said firstamino acid sequence and second amino acid sequence are contiguous and ina sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for an edge portionof M78076_PEA_(—)1_P21 (SEQ ID NO:1354), comprising a polypeptide havinga length “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise EA, having a structureas follows: a sequence starting from any of amino acid numbers 352−x to352; and ending at any of amino acid numbers 353+((n−2)−x), in which xvaries from 0 to n−2.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forM78076_PEA_(—)1_P24 (SEQ ID NO:1355), comprising a first amino acidsequence being at least 90% homologous toMGPASPAARGLSRRPGQPPLPLLLPLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPMERWCGGSRSGSCAHPHHQVVPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAVVGKVTPTPRPTDGVDIYFGMPGEISEHEGFLRAKMDLEERRMRQINEVMREWAMADNQSKNLPKADRQALNEHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVHTHLQVIEERVNQSLGLLDQNPHLAQELRPQI correspondingto amino acids 1-481 of APP1_HUMAN (SEQ ID NO:1439), which alsocorresponds to amino acids 1-481 of M78076_PEA_(—)1_P24 (SEQ IDNO:1355), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence RECLLPWLPLQISEGRS (SEQ ID NO: 1721) corresponding toamino acids 482-498 of M78076_PEA_(—)1_P24 (SEQ ID NO:1355), whereinsaid first amino acid sequence and second amino acid sequence arecontiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofM78076_PEA_(—)1_P24 (SEQ ID NO:1355), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence RECLLPWLPLQISEGRS (SEQ ID NO: 1721) inM78076 _PEA_(—)1_P24 (SEQ ID NO:1355).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forM78076_PEA_(—)1_P2 (SEQ ID NO:1356), comprising a first amino acidsequence being at least 90% homologous toMGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPMERWCGGSRSGSCAHPHHQVVPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAVVGKVTPTPRPTDGVDIYFGMPGEISEHEGFLRAKMDLEERRMRQINEVMREWAMADNQSKNLPKADRQALNEHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQV corresponding to amino acids 1-449 ofAPP1_HUMAN (SEQ ID NO:1439), which also corresponds to amino acids 1-449of M78076_PEA_(—)1_P2 (SEQ ID NO:1356), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequenceLTSFQLPNAPLFLRRPRLRLFSCPLDPLSVSWTPSYPLNTASLPLPSLSAQLPDPETWTLTCCVFDPCFLALGFLLPPPSILCSVPWIFTAFPRIVFFFFFFFLRQVLALSPRQESSVRSWLIATSTSWVQAILLPQPLE (SEQID NO: 1722) corresponding to amino acids 450-588 of M78076_PEA_(—)1_P2(SEQ ID NO:1356), wherein said first amino acid sequence and secondamino acid sequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofM78076_PEA_(—)1_P2 (SEQ ID NO:1356), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequenceLTSFQLPNAPLFLRRPRLRLFSCPLDPLSVSWTPSYPLNTASLPLPSLSAQLPDPETWTLTCCVFDPCFLALGFLLPPPSILCSVPWIFTAFPRIVFFFFFFLRQVLALSPRQESSVRSWLIATSTSWVQAILLPQPLE (SEQID NO: 1722) in M78076_PEA_(—)1_P2 (SEQ ID NO:1356).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forM78076_PEA_(—)1_P25 (SEQ ID NO:1357), comprising a first amino acidsequence being at least 90% homologous toMGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPMERWCGGSRSGSCAHPHHQVVPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAVVGKVTPTPRPTDGVDIYFGMPGEISEHEGFLRAKMDLEERRMRQINEVMREWAMADNQSKNLPKADRQALNEHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQ corresponding to amino acids 1-448 ofAPP1_HUMAN (SEQ ID NO:1439), which also corresponds to amino acids 1-448of M78076_PEA_(—)1_P25 (SEQ ID NO:1357), and a second amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequencePQNPNSQPRAAGSLEVIISHPFVRRLEILISPFQFQNSIPKNSQIVPAASPRGTSSP (SEQ ID NO:1723) corresponding to amino acids 449-505 of M78076_PEA_(—)1_P25 (SEQID NO:1357), wherein said first amino acid sequence and second aminoacid sequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofM78076_PEA_(—)1_P25 (SEQ ID NO:1357), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequencePQNPNSQPRAAGSLEVIISHPFVRRLEILISPFQFQNSIPKNSQIVPAASPRGTSSP (SEQ ID NO:1723) in M78076_PEA_(—)1_P25 (SEQ ID NO:1357).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forM79217_PEA_(—)1_P1 (SEQ ID NO:1336), comprising a first amino acidsequence being at least 90% homologous toMTGYTMLRNGGAGNGGQTCMLRWSNRIRLTWLSFTLFVILVFFPLIAHYYLTTLDEADEAGKRIFGPRVGNELCEVKHVLDLCRIRESVSEELLQLEAKRQELNSEIAKLNLKIEACKKSIENAKQDLLQLKNVISQTEHSYKELMAQNQPKLSLPIRLLPEKDDAGLPPPICATRGCRLHNCFDYSRCPLTSGFPVYVYDSDQFVFGSYLDPLVKQAFQATARANVYVTENADIACLYVILVGEMQEPVVLRPAELEKQLYSLPHWRTDGHNHVIINLSRKSDTQNLLYNVSTGRAMVAQSTFYTVQYRPGFDLVVSPLVHAMSEPNFMEIPPQVPVKRKYLFTFQGEKIESLRSSLQEARSFEEEMEGDPPADYDDRIIATLKAVQDSKLDQVLVEFTCKNQPKPSLPTEWALCGEREDRLELLKLSTFALIITPGDPRLVISSGCATRLFEALEVGAVPVVLGEQVQLPYQDMLQWNEAALVVPKPRVTEVHFLLRSLSDSDLLAMRRQGRFLWETYFSTADSIFNTVLAMIRTRIQIPAAPIREEAAAEIPHRSGKAAGTDPNMADNGDLDLGPVETEPPYASPRYLRNFTLTVTDFYRSWNCAPGPFHLFPHTPFDPVLPSEAKFLGSGTGFRPIGGGAGGSGKEFQAALGGNVPREQFTVVMLTYEREEVLMNSLERLNGLPYLNKVVVVWNSPKLPLPSEDLLWPDIGVPIMVVRTEKNSLNNRFLPWNEIETEAILSIDDDAHLRHDEIMFGFRVWREARDRIVGFPGRYHAWDIPHQSWLYNSNYSCELSMVLTGAAFFHKYYAYLYSYVMPQAIRDMVDEYINCEDIAMNFLVSHITRKPPIKVTSRWTFRCPGCPQALSHDDSHFHERHKCINFFVKVYGYMPLLYTQFRVDSVLFKTRLPHDKTKCFKFIcorresponding to amino acids 13-931 of BAA25445 (SEQ ID NO:1437), whichalso corresponds to amino acids 1-919 of M79217_PEA1_P1 (SEQ IDNO:1336).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forM79217_PEA_(—)1_P2 (SEQ ID NO:1337), comprising a first amino acidsequence being at least 90% homologous toMTGYTMLRNGGAGNGGQTCMLRWSNRIRLTWLSFTLFVILVFFPLIAHYYLTTLDEADEAGKRIFGPRVGNELCEVKHVLDLCRIRESVSEELLQLEAKRQELNSEIAKLNLKIEACKKSIENAKQDLLQLKNVISQTEHSYKELMAQNQPKLSLPIRLLPEKDDAGLPPPKATRGCRLHNCFDYSRCPLTSGFPVYVYDSDQFVFGSYLDPLVKQAFQATARANVYVTENADIACLYVILVGEMQEPVVLRPAELEKQLYSLPHWRTDGHNHVIINLSRKSDTQNLLYNVSTGRAMVAQSTFYTVQYRPGFDLVVSPLVHAMSEPNFMEIPPQVPVKRKYLFTFQGEKIESLRSSLQEARSFEEEMEGDPPADYDDRIIATLKAVQDSKLDQVLVEFTCKNQPKPSLPTEWALCGEREDRLELLKLSTFALIITPGDPRLVISSGCATRLFEALEVGAVPVVLGEQVQLPYQDMLQWNEAALVVPKPRVTEVHFLLRSLSDSDLLAMRRQGRFLWETYFSTADSIFNTVLAMIRTRIQIPAAPIREEAAAEIPHRSGKAAGTDPNMADNGDLDLGPVETEPPYASPRYLRNFTLTVTDFYRSWNCAPGPFHLFPHTPFDPVLPSEAKFLGSGTGFRPIGGGAGGSGKEFQAALGGNVPREQFTVVMLTYEREEVLMNSLERLNGLPYLNKVVVVWNSPKLPSEDLLWPDIGVPIMVVRTEKNSLNNRFLPWNEIETEAILSIDDDAHLRHDEIMFGFRVWREARDRIVGFPGRYHAWDIPHQSWLYNSNYSCELSMVLTGAAFFHK corresponding to amino acids 1-807 ofEXL3_HUMAN (SEQ ID NO:1436), which also corresponds to amino acids 1-807of M79217_PEA_(—)1_P2 (SEQ ID NO:1337), and a second amino acid sequencebeing at least 90% homologous toAIRDMVDEYINCEDIAMNFLVSHITRKPPIKVTSRWTFRCPGCPQALSHDDSHFHERHKCINFFVKVYGYMPLLYTQFRVDSVLFKTRLPHDKTKCFKFI corresponding to amino acids 820-919 ofEXL3_HUMAN (SEQ ID NO:1436), which also corresponds to amino acids808-907 of M79217_PEA_(—)1_P2 (SEQ ID NO:1337), wherein said first aminoacid sequence and second amino acid sequence are contiguous and in asequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for an edge portionof M79217_PEA_(—)1_P2 (SEQ ID NO:1337), comprising a polypeptide havinga length “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise KA, having a structureas follows: a sequence starting from any of amino acid numbers 807−x to807; and ending at any of amino acid numbers 808+((n−2)−x), in which xvaries from 0 to n−2.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forM79217_PEA_(—)1_P4 (SEQ ID NO:1338), comprising a first amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequencePELRQPARLGLPECWDYRHEPRCPAQMGSHFIVQAGLKLLASSKPPKCWDY (SEQ ID NO: 1724)corresponding to amino acids 1-51 of M79217_PEA_(—)1_P4 (SEQ IDNO:1338), and a second amino acid sequence being at least 90% homologousto RVWREARDRIVGFPGRYHAWDIPHQSWLYNSNYSCELSMVLTGAAFFHKYYAYLYSYVMPQAIRDMVDEYINCEDIAMNFLVSHITRKPPIKVTSRWTFRCPGCPQALSHDDSHFHERHKCINFFVKVYGYMPLLYTQFRVDSVLFKTRLPHDKTKCFKFI corresponding to amino acids 759-919 of EXL3_HUMAN(SEQ ID NO:1436), which also corresponds to amino acids 52-212 ofM79217_PEA_(—)1_P4 (SEQ ID NO:1338), wherein said first amino acidsequence and second amino acid sequence are contiguous and in asequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a head ofM79217_PEA_(—)1_P4 (SEQ ID NO:1338), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequencePELRQPARLGLPECWDYRHEPRCPAQMGSHFIVQAGLKLLASSKPPKCWDY (SEQ ID NO: 1724) ofM79217_PEA_(—)1_P4 (SEQ ID NO:1338).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forM79217_PEA_(—)1_P8 (SEQ ID NO:1339), comprising a first amino acidsequence being at least 90% homologous toMTGYTMLRNGGAGNGGQTCMLRWSNRIRLTWLSFTLFVILVFFPLIAHYYLTTLDEADEAGKRIFGPRVGNELCEVKHVLDLCRIRESVSEELLQLEAKRQELNSEIAKLNLKIEACKKSIENAKQDLLQLKNVISQTEHSYKELMAQNQPKLSLPIRLLPEKDDAGLPPPKATRGCRLHNCFDYSRCPLTSGFPVYVYDSDQFVFGSYLDPLVKQAFQATARANVYVTENADIACLYVILVGEMQEPVVLRPAELEKQLYSLPHWRTDGHNHVIINLSRKSDTQNLLYNVSTGRAMVAQSTFYTVQYRPGFDLVVSPLVHAMSEPNFMEIPPQVPVKRKYLFTFQGEKIESLRSSLQEARSFEEEMEGDPPADYDDRHATLKAVQDSKLDQVLVEFTCKNQPKPSLPTEWALCGEREDRLELLKLSTFALIITPGDPRLVISSGCATRLFEALEVGAVPVVLGEQVQLPYQDMLQWNEAALVVPKPRVTEVHFLLRSLSDSDLLAMRRQGRFLWETYFSTADSIFNTVLAMIRTRIQIPAAPIREEAAAEIPHRSGKAAGTDPNMADNGDLDLGPVETEPPYASPRYLRNFTLTVTDFYRSWNCAPGPFHLFPHTPFDPVLPSEAKFLGSGTGFRPIGGGAGGSGKEFQAALGGNVPREQFTVVMLTYEREEVLMNSLERLNGLPYLNKVVVVWNSPKLPSEDLLWPDIGVPIMVVRTEKNSLNNRFLPWNEIETEAILSIDDDAHLRHDEIMFGFRVWREARDRIVGFPGRYHAWDIPHQSWLYNSNYSCELSMVLTGAAFFHK corresponding to amino acids 1-807 ofEXL3_HUMAN (SEQ ID NO:1436), which also corresponds to amino acids 1-807of M79217_PEA_(—)1_P8 (SEQ ID NO:1339), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence VRKSW (SEQ ID NO: 1725)corresponding to amino acids 808-812 of M79217_PEA_(—)1_P8 (SEQ IDNO:1339), wherein said first amino acid sequence and second amino acidsequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofM79217_PEA_(—)1_P8 (SEQ ID NO:1339), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence VRKSW (SEQ ID NO: 1725) inM79217_PEA_(—)1_P8 (SEQ ID NO:1339).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forM62096_PEA_(—)1_P4 (SEQ ID NO:1341), comprising a first amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence MATYIH (SEQ ID NO: 1726)corresponding to amino acids 1-6 of M62096_PEA_(—)1_P4 (SEQ ID NO:1341),and a second amino acid sequence being at least 90% homologous toVSKTGAEGAVLDEAKNINKSLSALGNVISALAEGTKTHVPYRDSKMTRILQDSLGGNCRTTIVICCSPSVFNEAETKSTLMFGQRAKTIKNTVSVNLELTAEEWKKKYEKEKEKNKTLKNVIQHLEMELNRWRNGEAVPEDEQISAKDQKNLEPCDNTPIIDNIAPVVAGISTEEKEKYDEEISSLYRQLDDKDDEINQQSQLAEKLKQQMLDQDELLASTRRDYEKIQEELTRLQIENEAAKDEVKEVLQALEELAVNYDQKSQEVEDKTRANEQLTDELAQKTTTLTTTQRELSQLQELSNHQKKRATEILNLLLKDLGEIGGIIGTNDVKTLADVNGVIEEEFTMARLYISKMKSEVKSLVNRSKQLESAQMDSNRKMNASERELAACQLLISQHEAKIKSLTDYMQNMEQKRRQLEESQDSLSEELAKLRAQEKMHEVSFQDKEKEHLTRLQDAEEMKKALEQQMESHREAHQKQLSRLRDEIEEKQKIIDEIRDLNQKLQLEQEKLSSDYNKLKIEDQEREMKLEKLLLLNDKREQAREDLKGLEETVSRELQTLHNLRKLFVQDLTTRVKKSVELDNDDGGGSAAQKQKISFLENNLEQLTKVHKQLVRDNADLRCELPKLEKRLRATAERVKALESALKEAKENAMRDRKRYQQEVDRIKEAVRAKNMARRAHSAQIAKPIRPGHYPASSPTAVHAIRGGGGSSSNSTHYQK corresponding to amino acids 239-957 of KF5C_HUMAN(SEQ ID NO:1438), which also corresponds to amino acids 7-725 ofM62096_PEA_(—)1_P4 (SEQ ID NO:1341), wherein said first amino acidsequence and second amino acid sequence are contiguous and in asequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a head ofM62096_PEA_(—)1_P4 (SEQ ID NO:1341), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence MATYIH (SEQ ID NO: 1726) ofM62096_PEA_(—)1_P4 (SEQ ID NO:1341).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forM62096_PEA_(—)1_P5 (SEQ ID NO:1342), comprising a first amino acidsequence being at least 90% homologous toMTRILQDSLGGNCRTTIVICCSPSVFNEAETKSTLMFGQRAKTIKNTVSVNLELTAEEWKKKYEKEKEKNKTLKNVIQHLEMELNRWRNGEAVPEDEQISAKDQKNLEPCDNTPIIDNIAPVVAGISTEEKEKYDEEISSLYRQLDDKDDEINQQSQLAEKLKQQMLDQDELLASTRRDYEKIQEELTRLQIENEAAKDEVKEVLQALEELAVNYDQKSQEVEDKTRANEQLTDELAQKTTTLTTTQRELSQLQELSNHQKKRATEILNLLLKDLGEIGGIIGTNDVKTLADVNGVIEEEFTMARLYISKMKSEVKSLVNRSKQLESAQMDSNRKMNASERELAACQLLISQHEAKIKSLTDYMQNMEQKRRQLEESQDSLSEELAKLRAQEKMHEVSFQDKEKEHLTRLQDAEEMKKALEQQMESHREAHQKQLSRLRDEIEEKQKIIDEIRDLNQKLQLEQEKLSSDYNKLKIEDQEREMKLEKLLLLNDKREQAREDLKGLEETVSRELQTLHNLRKLFVQDLTTRVKKSVELDNDDGGGSAAQKQKISFLENNLEQLTKVHKQLVRDNADLRCELPKLEKRLRATAERVKALESALKEAKENAMRDRKRYQQEVDRIKEAVRAKNMARRAHSAQIAKPIRPGHYPASSPTAVHAIRGGGGSSSNSTHYQK corresponding to amino acids284-957 of KF5C_HUMAN (SEQ ID NO:1438), which also corresponds to aminoacids 1-674 of M62096_PEA_(—)1_P5 (SEQ ID NO:1342).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forM62096_PEA_(—)1_P3 (SEQ ID NO:1343), comprising a first amino acidsequence being at least 90% homologous toMELNRWRNGEAVPEDEQISAKDQKNLEPCDNTPIIDNIAPVVAGISTEEKEKYDEEISSLYRQLDDKDDEINQQSQLAEKLKQQMLDQDELLASTRRDYEKIQEELTRLQIENEAAKDEVKEVLQALEELAVNYDQKSQEVEDKTRANEQLTDELAQKTTTLTTTQRELSQLQELSNHQKKRATEILNLLLKDLGEIGGIIGTNDVKTLADVNGVIEEEFTMARLYISKMKSEVKSLVNRSKQLESAQMDSNRKMNASERELAACQLLISQHEAKIKSLTDYMQNMEQKRRQLEESQDSLSEELAKLRAQEKMHEVSFQDKEKEHLTRLQDAEEMKKALEQQMESHREAHQKQLSRLRDEIEEKQKIIDEIRDLNQKLQLEQEKLSSDYNKLKIEDQEREMKLEKLLLLNDKREQAREDLKGLEETVSRELQTLHNLRKLFVQDLTTRVKKSVELDNDDGGGSAAQKQKISFLENNLEQLTKVHKQLVRDNADLRCELPKLEKRLRATAERVKALESALKEAKENAMRDRKRYQQEVDRIKEAVRAKNMARRAHSAQIAKPIRPGHYPASSPTAVHAIRGGGGSSSNSTHYQK corresponding to amino acids 365-957of KF5C_HUMAN (SEQ ID NO:1438), which also corresponds to amino acids1-593 of M62096_PEA_(—)1_P3 (SEQ ID NO:1343).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forM62096_PEA_(—)1_P7 (SEQ ID NO:1344), comprising a first amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence MTQNFRLMWNILLFPLNFS (SEQID NO: 1727) corresponding to amino acids 1-19 of M62096_PEA_(—)1_P7(SEQ ID NO:1344), and a second amino acid sequence being at least 90%homologous toLNQKLQLEQEKLSSDYNKLKIEDQEREMKLEKLLLLNDKREQAREDLKGLEETVSRELQTLHNLRKLFVQDLTTRVKKSVELDNDDGGGSAAQKQKISFLENNLEQLTKVHKQLVRDNADLRCELPKLEKRLRATAERVKALESALKEAKENAMRDRKRYQQEVDRIKEAVRAKNMARRAHSAQIAKPIRPGHYPASSPTAVHAIRGGGGSSSNSTHYQK corresponding to amino acids 738-957 of KF5C_HUMAN (SEQ IDNO:1438), which also corresponds to amino acids 20-239 ofM62096_PEA_(—)1_P7 (SEQ ID NO:1344), wherein said first amino acidsequence and second amino acid sequence are contiguous and in asequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a head ofM62096_PEA_(—)1_P7 (SEQ ID NO:1344), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence MTQNFRLMWNILLFPLNFS (SEQ ID NO: 1727) ofM62096_PEA_(—)1_P7 (SEQ ID NO:1344).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forM62096_PEA_(—)1_P8 (SEQ ID NO:1345), comprising a first amino acidsequence being at least 90% homologous toMADPAECSIKVMCRFRPLNEAEILRGDKFIPKFKGDETVVIGQGKPYVFDRVLPPNTTQEQVYNACAKQIVKDVLEGYNGTIFAYGQTSSGKTHTMEGKLHDPQLMGIIPRIAHDIFDHIYSMDENLEFHIKVSYFEIYLDKIRDLLDVSKTNLAVHEDKNRVPYVKGCTERFVSSPEEVMDVIDEGKANRHVAVTNMNEHSSRSHSIFLINIKQENVETEKKLSGKLYLVDLAGSEKVSKTGAEGAVLDEAKNINKSLSALGNVISALAEGTKTHVPYRDSKMTRILQDSLGGNCRTTIVICCSPSVFNEAETKSTLMFGQRAKTIKNTVSVNLELTAEEWKKKYEKEKEKNKTLKNVIQHLEMELNRWRNGEAVPEDEQISAKDQKNLEPCDNTPIIDNIAPVVAGISTEEKEKYDEEISSLYRQLDDKDDEINQQSQLAEKLKQQMLDQDELLASTRRDYEKIQEELTRLQIENEAAKDEVKEVLQALEELAVNYDQKSQEVEDKTRANEQLTDELAQKTTTLTTTQRELSQLQELSNHQKKRATEILNLLLKDLGEIGGIIGTNDVKTLADVNGVIEEEFTMARLYISKMKSEVKSLVNRSKQLESAQMDSNRKMNASERELAACQLLISQHEAKIKSLTDYMQNMEQKRRQLEESQDSLSEELAKLRAQEKMHEVSFQDKEKEHLTRLQDAEEMKKALEQQMESHREAHQKQLSRLRDEIEEKQKIIDEIR corresponding to amino acids 1-736 ofKF5C_HUMAN (SEQ ID NO:1438), which also corresponds to amino acids 1-736of M62096_PEA_(—)1_P8 (SEQ ID NO:1345), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence E corresponding to amino acids737-737 of M62096_PEA_(—)1_P8 (SEQ ID NO:1345), wherein said first aminoacid sequence and second amino acid sequence are contiguous and in asequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forM62096_PEA_(—)1_P9 (SEQ ID NO:1346), comprising a first amino acidsequence being at least 90% homologous toMADPAECSIKVMCRFRPLNEAEILRGDKFIPKFKGDETVVIGQGKPYVFDRVLPPNTTQEQVYNACAKQIVKDVLEGYNGTIFAYGQTSSGKTHTMEGKLHDPQLMGIIPRIAHDIFDHIYSMDENLEFHIKVSYFEIYLDKIRDLLDVSKTNLAVHEDKNRVPYVKGCTERFVSSPEEVMDVIDEGKANRHVAVTNMNEHSSRSHSIFLINIKQENVETEKKLSGKLYLVDLAGSEKVSKTGAEGAVLDEAKNINKSLSALGNVISALAEGTKTHVPYRDSKMTRILQDSLGGNCRTTIVICCSPSVFNEAETKSTLMFGQRAKTIKNTVSVNLELTAEEWKKKYEKEKEKNKTLKNVIQHLEMELNRWRNGEAVPEDEQISAKDQKNLEPCDNTPIIDNIAPVVAGISTEEKEKYDEEISSLYRQLDDKDDEINQQSQLAEKLKQQMLDQDE corresponding to amino acids 1-454 ofKF5C_HUMAN (SEQ ID NO:1438), which also corresponds to amino acids 1-454of M62096_PEA_(—)1_P9 (SEQ ID NO:1346), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequenceVKNAIYFFFHKVLLLLFVVDVCSRNLIGIEAFHNYRIMWKFLGRCPFTASYKLIITEFRK (SEQ ID NO:1728) corresponding to amino acids 455-514 of M62096_PEA_(—)1_P9 (SEQ IDNO:1346), wherein said first amino acid sequence and second amino acidsequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofM62096_PEA_(—)1_P9 (SEQ ID NO:1346), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequenceVKNAIYFFFHKVLLLLFVVDVCSRNLIGIEAFHNYRIMWKFLGRCPFTASYKLIITEFRK (SEQ ID NO:1728) in M62096_PEA_(—)1_P9 (SEQ ID NO:1346).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forM62096_PEA_(—)1_P10 (SEQ ID NO:1347), comprising a first amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence MTQNFRLMWNILLFPLNFS (SEQID NO: 1727) corresponding to amino acids 1-19 of M62096_PEA_(—)1_P10(SEQ ID NO:1347), a second amino acid sequence being at least 90%homologous toLNQKLQLEQEKLSSDYNKLKIEDQEREMKLEKLLLLNDKREQAREDLKGLEETVSRELQTLHNLRKLFVQDLTTRVKK corresponding to amino acids 738-815 of KF5C_HUMAN (SEQ IDNO:1438), which also corresponds to amino acids 20-97 ofM62096_PEA_(—)1_P10 (SEQ ID NO:1347), and a third amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence VSSLCLNGTEKKIKDGREESFSVEISLA (SEQID NO: 1730) corresponding to amino acids 98-125 of M62096_PEA_(—)1_P10(SEQ ID NO:1347), wherein said first amino acid sequence, second aminoacid sequence and third amino acid sequence are contiguous and in asequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a head ofM62096_PEA_(—)1_P10 (SEQ ID NO:1347), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence MTQNFRLMWNILLFPLNFS (SEQ ID NO: 1727) ofM62096_PEA_(—)1_P10 (SEQ ID NO:1347).

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofM62096_PEA_(—)1_P10 (SEQ ID NO:1347), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence VSSLCLNGTEKKIKDGREESFSVEISLA (SEQ ID NO:1730) in M62096_PEA_(—)1_P10 (SEQ ID NO:1347).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forM62096_PEA_(—)1_P11 (SEQ ID NO:1348), comprising a first amino acidsequence being at least 90% homologous toMADPAECSIKVMCRFRPLNEAEILRGDKFIPKFKGDETVVIGQGKPYVFDRVLPPNTTQEQVYNACAKQIVKDVLEGYNGTIFAYGQTSSGKTHTMEGKLHDPQLMGHPRIAHDIFDHIYSMDENLEFHIKVSYFEIYLDKIRDLLDVSKTNLAVHEDKNRVPYVKGCTERFVSSPEEVMDVIDEGICANRHVAVTNMNEHSSRSHSIFLINIKQENVETEKKLSGKLYLVDLAGSEKVSKTGAEGAVLDEAKNINKSLSALGNVISALAEGTKTHVPYRDSKMTRILQDSLGGNCRTTIVICCSPSVFNEAETKSTLMFGQRAKTIKNTVSVNLELTAEEWKKKYEKEKEKNKTLKNVIQHLEMELNRWRN corresponding to amino acids 1-372 of KF5C_HUMAN (SEQID NO:1438), which also corresponds to amino acids 1-372 ofM62096_PEA_(—)1_P11 (SEQ ID NO:1348), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence DFLAAHVFGKLLE (SEQ ID NO: 1731)corresponding to amino acids 373-385 of M62096_PEA_(—)1_P11 (SEQ IDNO:1348), wherein said first amino acid sequence and second amino acidsequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofM62096_PEA_(—)1_P11 (SEQ ID NO:1348), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence DFLAAHVFGKLLE (SEQ ID NO: 1731) inM62096_PEA_(—)1_P11 (SEQ ID NO:1348).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forM62096_PEA_(—)1_P12 (SEQ ID NO:1349), comprising a first amino acidsequence being at least 90% homologous toMADPAECSIKVMCRFRPLNEAEILRGDKFIPKFKGDETVVIGQGKPYVFDRVLPPNTTQEQVYNACAKQIVKDVLEGYNGTIFAYGQTSSGKTHTMEGKLHDPQLMGIIPRIAHDIFDHIYSMDENLEFHIKVSYFEIYLDKIRDLLDVSKTNLAVHEDKNRVPYVKGCTERFVSSPEEVMDVIDEGKANRHVAVTNMNEHSSRSHSIFLINIKQENVETEKKLSGKLYLVDLAGSEKVSKTGAEGAVLDEAKNINKSLSALGNVISALAEGTKTHVPYRDSKMTRILQDSLGGNCRTTIVICCSPSVFNEAETKSTLMFGQR corresponding to amino acids1-323 of KF5C_HUMAN (SEQ ID NO:1438), which also corresponds to aminoacids 1-323 of M62096_PEA_(—)1_P12 (SEQ ID NO:1349), and a second aminoacid sequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence V corresponding to aminoacids 324-324 of M62096_PEA_(—)1_P12 (SEQ ID NO:1349), wherein saidfirst amino acid sequence and second amino acid sequence are contiguousand in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forT99080_PEA_(—)4_P5 (SEQ ID NO:1360), comprising a first amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequenceMPASARLAGAGLLLAFLRALGCAGRAPGLS (SEQ ID NO: 1732) corresponding to aminoacids 1-30 of T99080_PEA_(—)4_P5 (SEQ ID NO:1360), and a second aminoacid sequence being at least 90% homologous toMAEGNTLISVDYEIFGKVQGVFFRKHTQAEGKKLGLVGWVQNTDRGTVQGQLQGPISKVRHMQEWLETRGSPKSHIDKANFNNEKVILKLDYSDFQIVK corresponding to amino acids 1-99 ofACYO_HUMAN_V1 (SEQ ID NO:1441), which also corresponds to amino acids31-129 of T99080_PEA_(—)4_P5 (SEQ ID NO:1360), wherein said first aminoacid sequence and second amino acid sequence are contiguous and in asequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a head ofT99080_PEA_(—)4_P5 (SEQ ID NO:1360), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence MPASARLAGAGLLLAFLRALGCAGRAPGLS (SEQ IDNO: 1732) of T99080_PEA_(—)4_P5 (SEQ ID NO:1360).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forT99080_PEA_(—)4_P8 (SEQ ID NO:1361), comprising a first amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence M corresponding to aminoacids 1-1 of T99080_PEA_(—)4_P8 (SEQ ID NO:1361), and a second aminoacid sequence being at least 90% homologous toQAEGKKLGLVGWVQNTDRGTVQGQLQGPISKVRHMQEWLETRGSPKSHIDKANFNNEKVILKLDYSDFQIVK corresponding to amino acids 28-99 of ACYO_HUMAN_V1 (SEQ IDNO:1441), which also corresponds to amino acids 2-73 ofT99080_PEA_(—)4_P8 (SEQ ID NO:1361), wherein said first amino acidsequence and second amino acid sequence are contiguous and in asequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forT08446_PEA_(—)1_P18 (SEQ ID NO:1370), comprising a first amino acidsequence being at least 90% homologous toMLSLSLCSHLWGPLILSALQARSTDSLDGPGEGSVQPLPTAGGPSVKGKPGKRLSAPRGPFPRLADCAHFHYENVDFGHIQLLLSPDREGPSLSGENELVFGVQVTCQGRSWPVLRSYDDFRSLDAHLHRCIFDRRFSCLPELPPPPEGARAAQMLVPLLLQYLETLSGLVDSNLNCGPVLTWME corresponding to amino acids1-185 of SNXQ_HUMAN (SEQ ID NO:1442), which also corresponds to aminoacids 1-185 of T08446_PEA_(—)1_P18 (SEQ ID NO:1370), and a second aminoacid sequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequenceLDNHGRRLLLSEEASLNIPAVAAAHVIKRYTAQAPDELSFEVGDIVSVIDMPPTEDRSWWRGKRGFQVGFFPSECVELFTERPGPGLKADADGPPCGIPAPQGISSLTSAVPRPRGKLAGLLRTFMRSRPSRQRLRQRGILRQRVFGCDLGEHLSNSGQDVPQVLRCCSEFIEAHGVVDGIYRLSGVSSNIQRLRHEFDSERIPELSGPAFLQDIHSVSSLCKLYFRELPNPLLTYQLYGKFSEAMSVPGEEERLVRVHDVIQQLPPPHYRTLEYLLRHLARMARHSANTSMHARNLAIVWAPNLLRSMELESVGMGGAAAFREVRVQSVVVEFLLTHVDVLFSDTFTSAGLDPAGRCLLPRPKSLAGSCPSTRLLTLEEAQARTQGRLGTPTEPTTPKAPASPAERRKGERGEKQRKPGGSSWKTFFALGRGPSVPRKKPLPWLGGTRAPPQPSGSRPDTVTLRSAKSEESLSSQASGAGLQRLHRLRRPHSSSDAFPVGPAPAGSCESLSSSSSESSSSESSSSSSESSAAGLGALSGSPSHRTSAWLDDGDELDFSPPRCLEGLRGLDFDPLTFRCSSPTPGDPAPPASPAPPAPASAFPPRVTPQAISPRGPTSPASPAALDISEPLAVSVPPAVLELLGAGGAPASATPTPALSPGRSLRPHLIPLLLRGAEAPLTDACQQEMCSKLRGAQGPLGPDMESPLPPPPLSLLRPGGAPPPPPKNPARLMALALAERAQQVAEQQSQQECGGTPPASQSPFHRSLSLEVGGEPLGTSGSGPPPNSLAHPGAWVPGPPPYLPRQQSDGSLLRSQRPMGTSRRGLRGPAQVSAQLRAGGGGRDAPEAAAQSPCSVPSQVPTPGFFSPAPRECLPPFLGVPKPGLYPLGPPSFQPSSPAPVWRSSLGPPAPLDRGENLYYEIGASEGSPYSGPTRSWSPFRSMPPDRLNASYGMLGQSPPLHRSPDFLLSYPPAPSCFPPDHLGYSAPQHPARRPTPPEPLYVNLALGPRGPSPASSSSSSPPAHPRSRSDPGPPVPRLPQKQRAPWGPRTPHRVPGPWGPPEPLLLYRAAPPAYGRGGELHRGSLYRNGGQRGEGAGPPPPYPTPSWSLHSEGQTRSYC (SEQ ID NO: 1733)corresponding to amino acids 186-1305 of T08446_PEA_(—)1_P18 (SEQ IDNO:1370), wherein said first amino acid sequence and second amino acidsequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofT08446_PEA_(—)1_P18 (SEQ ID NO:1370), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequenceLDNHGRRLLLSEEASLNIPAVAAAHVIKRYTAQAPDELSFEVGDIVSVIDMPPTEDRSWWRGKRGFQVGFFPSECVELFTERPGPGLKADADGPPCGIPAPQGISSLTSAVPRPRGKLAGLLRTFMRSRPSRQRLRQRGILRQRVFGCDLGEHLSNSGQDVPQVLRCCSEFIEAHGVVDGIYRLSGVSSNIQRLRHEFDSERIPELSGPAFLQDIHSVSSLCKLYFRELPNPLLTYQLYGKFSEAMSVPGEEERLVRVHDVIQQLPPPHYRTLEYLLRHLARMARHSANTSMHARNLAIVWAPNLLRSMELESVGMGGAAAFREVRVQSVVVEFLLTHVDVLFSDTFTSAGLDPAGRCLLPRPKSLAGSCPSTRLLTLEEAQARTQGRLGTPTEPTTPKAPASPAERRKGERGEKQRKPGGSSWKTFFALGRGPSVPRKKPLPWLGGTRAPPQPSGSRPDTVTLRSAKSEESLSSQASGAGLQRLHRLRRPHSSSDAFPVGPAPAGSCESLSSSSSSESSSSESSSSSSESSAAGLGALSGSPSHRTSAWLDDGDELDFSPPRCLEGLRGLDFDPLTFRCSSPTPGDPAPPASPAPPAPASAFPPRVTPQAISPRGPTSPASPAALDISEPLAVSVPPAVLELLGAGGAPASATPTPALSPGRSLRPHUPLLLRGAEAPLTDACQQEMCSKLRGAQGPLGPDMESPLPPPPLSLLRPGGAPPPPPKNPARLMALALAERAQQVAEQQSQQECGGTPPASQSPFHRSLSLEVGGEPLGTSGSGPPPNSLAHPGAWVPGPPPYLPRQQSDGSLLRSQRPMGTSRRGLRGPAQVSAQLRAGGGGRDAPEAAAQSPCSVPSQVPTPGFFSPAPRECLPPFLGVPKPGLYPLGPPSFQPSSPAPVWRSSLGPPAPLDRGENLYYEIGASEGSPYSGPTRSWSPFRSMPPDRLNASYGMLGQSPPLHRSPDFLLSYPPAPSCFPPDHLGYSAPQHPARRPTPPEPLYVNLALGPRGPSPASSSSSSPPAHPRSRSDPGPPVPRLPQKQRAPWGPRTPHRVPGPWGPPEPLLLYRAAPPAYGRGGELHRGSLYRNGGQRGEGAGPPPPYPTPSWSLHSEGQTRSYC (SEQ ID NO: 1733) inT08446_PEA_(—)1_P18 (SEQ ID NO:1370).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forT08446_PEA_(—)1_P18 (SEQ ID NO:1370), comprising a first amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequenceMLSLSLCSHLWGPLILSALQARSTDSLDGPGEGSVQPLPTAGGPSVKGKPGKRLSAPRGPFPRLADCAHFHYENVDFGHIQLLLSPDREGPSLSGENELVFGVQVTCQGRSWPVLRSYDDFRSLDAHLHRCIFDRRFSCLPELPPPPEGARAAQMLVPLLLQYLETLSGLVDSNLNCGPVLTWMELDNHGRRLLLSEEASLNIPAVAAAHVIKRYTAQAPDELSFEVGDIVSVIDMPPTEDRSWWRGKRGFQVGFFPSECVELFTERPGPGLKADADGPPCGIPAPQGISSLTSAVPRPRGKLAGLLRTFMRSRPSRQRLRQRGILRQRVFGCDLGEHLSNSGQDVPQVLRCCSEFIEAHGVVDGIYRLSGVSSNIQRLRHEFDSERIPELSGPAFLQDIHSVSSLCKLYFRELPNPLLTYQLYGKFSEAMSVPGEEERLVRV (SEQ ID NO: 1734) corresponding to amino acids 1-443 ofT08446_PEA_(—)1_P18 (SEQ ID NO:1370), a second amino acid sequence beingat least 90% homologous toHDVIQQLPPPHYRTLEYLLRHLARMARHSANTSMHARNLAIVWAPNLLRSMELESVGMGGAAAFREVRVQSVVVEFLLTHVDVLFSDTFTSAGLDPAGRCLLPRPKSLAGSCPSTRLLTLEEAQARTQGRLGTPTEPTTPKAPASPAERRKGERGEKQRKPGGSSWKTFFALGRGPSVPRKKPLPWLGGTRAPPQPSGSRPDTVTLRSAKSEESLSSQASGAGLQRLHRLRRPHSSSDAFPVGPAPAGSCESLSSSSSSESSSSESSSSSSESSAAGLGALSGSPSHRTSAWLDDGDELDFSPPRCLEGLRGLDFDPLTFRCSSPTPGDPAPPASPAPPAPASAFPPRVTPQAISPRGPTSPASPAALDISEPLAVSVPPAVLELLGAGGAPASATPTPALSPGRSLRPHLIPLLLRGAEAPLTDACQQEMCSKLRGAQGPLGPDMESPLPPPPLSLLRPGGAPPPPPKNPARLMALALAERAQQVAEQQSQQECGGTPPASQSPFHRSLSLEVGGEPLGTSGSGPPPNSLAHPGAWVPGPPPYLPRQQSDGSLLRSQRPMGTSRRGLRGPAQVSAQLRAGGGGRDAPEAAAQSPCSVPSQVPTPGFFSPAPRECLPPFLGVPKPGLYPLGPPSFQPSSPAPVWRSSLGPPAPLDRGENLYYEIGASEGSPYSG corresponding to amino acids 1-674 ofQ9NT23 (SEQ ID NO:1443), which also corresponds to amino acids 444-1117of T08446_PEA_(—)1_P18 (SEQ ID NO:1370), a bridging amino acid Pcorresponding to amino acid 1118 of T08446_PEA_(—)1_P18 (SEQ IDNO:1370), and a third amino acid sequence being at least 90% homologoustoTRSWSPFRSMPPDRLNASYGMLGQSPPLHRSPDFLLSYPPAPSCFPPDHLGYSAPQHPARRPTPPEPLYVNLALGPRGPSPASSSSSSPPAHPRSRSDPGPPVPRLPQKQRAPWGPRTPHRVPGPWGPPEPLLLYRAAPPAYGRGGELHRGSLYRNGGQRGEGAGPPPPYPTPSWSLHSEGQTRSYC corresponding to amino acids676-862 of Q9NT23 (SEQ ID NO:1443), which also corresponds to aminoacids 1119-1305 of T08446_PEA_(—)1_P18 (SEQ ID NO:1370), wherein saidfirst amino acid sequence, second amino acid sequence, bridging aminoacid and third amino acid sequence are contiguous and in a sequentialorder.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a head ofT08446_PEA_(—)1_P18 (SEQ ID NO:1370), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequenceMLSLSLCSHLWGPLILSALQARSTDSLDGPGEGSVQPLPTAGGPSVKGKPGKRLSAPRGPFPRLADCAHFHYENVDFGHIQLLLSPDREGPSLSGENELVFGVQVTCQGRSWPVLRSYDDFRSLDAHLHRCIFDRRFSCLPELPPPPEGARAAQMLVPLLLQYLETLSGLVDSNLNCGPVLTWMELDNHGRRLLLSEEASLNIPAVAAAHVIKRYTAQAPDELSFEVGDIVSVIDMPPTEDRSWWRGKRGFQVGFFPSECVELFTERPGPGLKADADGPPCGIPAPQGISSLTSAVPRPRGKLAGLLRTFMRSRPSRQRLRQRGILRQRVFGCDLGEHLSNSGQDVPQVLRCCSEFIEAHGVVDGIYRLSGVSSNIQRLRHEFDSERIPELSGPAFLQDIHSVSSLCKLYFRELPNPLLTYQLYGKFSEAMSVPGEEERLVRV (SEQ ID NO: 1734) of T08446_PEA_(—)1_P18 (SEQ ID NO:1370).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forT08446_PEA_(—)1_P18 (SEQ ID NO:1370), comprising a first amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequenceMLSLSLCSHLWGPLILSALQARSTDSLDGPGEGSVQPLPTAGGPSVKGKPGKRLSAPRGPFPRLADCAHFHYENVDFGHIQLLLSPDREGPSLSGENELVFGVQVTCQGRSWPVLRSYDDFRSLDAHLHRCIFDRRFSCLPELPPPPEGARAAQMLVPLLLQYLETLSGLVDSNLNCGPVLTWMELDNHGRRLLLSEEASLNIPAVAAAHVIKRYTAQAPDELSFEVGDIVSVIDMPPTEDRSWWRGKRGFQVGFFPSECVELFTERPGPGLKADADGPPCGIPAPQGISSLTSAVPRPRGKLAGLLRTFMRSRPSRQRLRQRGILRQRVFGCDLGEHLSNSGQDVPQVLRCCSEFIEAHGVVDGIYRLSGVSSNIQRLRHEFDSERIPELSGPAFLQDIHSVSSLCKLYFRELPNPLLTYQLYGKFSEAMSVPGEEERLVRVHDVIQQLPPPHYRTLEYLLRHLARMARHSANTSMHARNLAIVWAPNLLRSMELESVGMGGAAAFREVRVQSVVVEFLLTHVDVLFSDTFTSAGLDPAGRCLLPRPKSLAGSCPSTRLLTLEEAQARTQGRLGTPTEPTTPKAPASPAERRKGERGEKQRKPGGSSWKTFFALGRGPSVPRKKPLPWLGGTRAPPQPSGSRPDTVTLRSAKSEESLSSQASGAGLQRLHRLRRPHSSSDAFPVGPAPAGSCESLSSSSSSESSSSESSSSSSESSAAGLGALSGSPSHRTSAWLDDGDELDFSPPRCLEGLRGLDFDPLTFRCSSPTPGDPAPPASPAPPAPASAFPPRVTPQAISPRGPTSPASPAALDISEPLAVSVPPAVLELLGAGGAPASATPTPALSPGRSLRPHLIPLLLRGAEAPLTDACQQEMCSKLRGAQGPLGPDMESPLPPPPLSLLRPGGAPPPPPKNPARLMALALAERAQQVAEQQSQQECGGTPPASQSPFHRSLSLEVGGEPLGTSGSGPPPNSLAHPGAWVPGPPPYLPRQQSDGSLLRSQRPMGTSRRG corresponding to amino acids 1-1010 of T08446_PEA_(—)1_P18 (SEQID NO:1370), and a second amino acid sequence being at least 90%homologous toLRGPAQVSAQLRAGGGGRDAPEAAAQSPCSVPSQVPTPGFFSPAPRECLPPFLGVPKPGLYPLGPPSFQPSSPAPVWRSSLGPPAPLDRGENLYYEIGASEGSPYSGPTRSWSPFRSMPPDRLNASYGMLGQSPPLHRSPDFLLSYPPAPSCFPPDHLGYSAPQHPARRPTPPEPLYVNLALGPRGPSPASSSSSSPPAHPRSRSDPGPPVPRLPQKQRAPWGPRTPHRVPGPWGPPEPLLLYRAAPPAYGRGGELHRGSLYRNGGQRGEGAGPPPPYPTPSWSLHSEGQTRSYC corresponding to amino acids 1-295 of Q96CP3 (SEQ ID NO:1444),which also corresponds to amino acids 1011-1305 of T08446_PEA1_P18 (SEQID NO:1370), wherein said first amino acid sequence and second aminoacid sequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a head ofT08446PEA_(—)1_P18 (SEQ ID NO:1370), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequenceMLSLSLCSHLWGPLILSALQARSTDSLDGPGEGSVQPLPTAGGPSVKGKPGKRLSAPRGPFPRLADCAHFHYENVDFGHIQLLLSPDREGPSLSGENELVFGVQVTCQGRSWPVLRSYDDFRSLDAHLHRCIFDRRFSCLPELPPPPEGARAAQMLVPLLLQYLETLSGLVDSNLNCGPVLTWMELDNHGRRLLLSEEASLNIPAVAAAHVIKRYTAQAPDELSFEVGDIVSVIDMPPTEDRSWWRGKRGFQVGFFPSECVELFTERPGPGLKADADGPPCGIPAPQGISSLTSAVPRPRGKLAGLLRTFMRSRPSRQRLRQRGILRQRVFGCDLGEHLSNSGQDVPQVLRCCSEFIEAHGVVDGIYRLSGVSSNIQRLRHEFDSERIPELSGPAFLQDIHSVSSLCKLYFRELPNPLLTYQLYGKFSEAMSVPGEEERLVRVHDVIQQLPPPHYRTLEYLLRHLARMARHSANTSMHARNLAIVWAPNLLRSMELESVGMGGAAAFREVRVQSVVVEFLLTHVDVLFSDTFTSAGLDPAGRCLLPRPKSLAGSCPSTRLLTLEEAQARTQGRLGTPTEPTTPKAPASPAERRKGERGEKQRKPGGSSWKTFFALGRGPSVPRKKPLPWLGGTRAPPQPSGSRPDTVTLRSAKSEESLSSQASGAGLQRLHRLRRPHSSSDAFPVGPAPAGSCESLSSSSSSESSSSESSSSSSESSAAGLGALSGSPSHRTSAWLDDGDELDFSPPRCLEGLRGLDFDPLTFRCSSPTPGDPAPPASPAPPAPASAFPPRVTPQAISPRGPTSPASPAALDISEPLAVSVPPAVLELLGAGGAPASATPTPALSPGRSLRPHLIPLLLRGAEAPLTDACQAEMCSKLRGAQGPLGPDMESPLPPPPSLLRPGGAPPPPPKNPARLMALALAERAQQVAEQQSQQECGGTPPASQSPFHRSLSLEVGGEPLGTSGSGPPPNSLAHPGAWVPGPPPYLPRQQSDGSLLRSQRPMGTSRRG of T08446_PEA_(—)1_P18 (SEQ ID NO:1370).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forT08446_PEA_(—)1_P18 (SEQ ID NO:1370), comprising a first amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequenceMLSLSLCSHLWGPLILSALQARSTDSLDGPGEGSVQPLPTAGGPSVKGKPGKRLSAPRGPFPRLADCAHFHYENVDFGHIQLLLSPDREGPSLSGENELVFGVQVTCQGRSWPVLRSYDDFRSLDAHLHRCIFDRRFSCLPELPPPPEGARAAQ corresponding to amino acids 1-154 of T08446_PEA_(—)1_P18(SEQ ID NO:1370), a second amino acid sequence being at least 90%homologous toMLVPLLLQYLETLSGLVDSNLNCGPVLTWMELDNHGRRLLLSEEASLNIPAVAAAHVIKRYTAQAPDELSFEVGDIVSVIDMPPTEDRSWWRGKRGFQVGFFPSECVELFTERPGPGLKADADGPPCGIPAPQGISSLTSAVPRPRGKLAGLLRTFMRSRPSRQRLRQRGILRQRVFGCDLGEHLSNSGQDVPQVLRCCSEFIEAHGVVDGIYRLSGVSSNIQRLRHEFDSERMELSGPAFLQDIHSVSSLCKLYFRELPNPLLTYQLYGKFSEAMSVPGEEERLVRVHDVIQQLPPPHYRTLEYLLRHLARMARHSANTSMHARNLAIVWAPNLLRSMELESVGMGGAAAFREVRVQSVVVEFLLTHVDVLFSDTFTSAGLDPAGRCLLPRPKSLAGSCPSTRLLTLEEAQARTQGRLGTPTEPTTPKAPASPAERRKGERGEKQRKPGGSSWKTFFALGRGPSVPRKKPLPWLGGTRAPPQPSGSRPDTVTLRSAKSEESLSSQASGAGLQRLHRLRRPHSSSDAFPVGPAPAGSCESLSSSSSSESSSSESSSSSSESSAAGLGALSGSPSHRTSAWLDDGDELDFSPPRCLEGLRGLDFDPLTFRCSSPTPGDPAPPASPAPPAPASAFPPRVTPQAISPRGPTSPASPAALDISEPLAVSVPPAVLELLGAGGAPASATPTPALSPGRSLRPHLIPLLLRGAEAPLTDACQQEMCSKLRGAQGPLGPDMESPLPPPPLSLLRPGGAPPPPPKNPARLMALALAERAQQVAEQQSQQECGGTPPASQSPFHRSLSLEVGGEPLGTSGSGPPPNSLAHPGAWVPGPPPYLPRQQSDGSLLRSQRPMGTSRRGLRGPA corresponding to amino acids 1-861 of BAC86902 (SEQ ID NO:1445), whichalso corresponds to amino acids 155-1015 of T08446_PEA_(—)1_P18 (SEQ IDNO:1370), a third amino acid sequence being at least 70%, optionally atleast 80%, preferably at least 85%, more preferably at least 90% andmost preferably at least 95% homologous to a polypeptide having thesequence QVSAQLRAGGGGRDAPEAAAQSPCSVPS corresponding to amino acids1016-1043 of T08446_PEA_(—)1_P18 (SEQ ID NO:1370), a fourth amino acidsequence being at least 90% homologous toQVPTPGFFSPAPRECLPPFLGVPKPGLYPLGPPSFQPSSPAPVWRSSLGPPAPLDRGENLYYEIGASEGSPYSGPTRSWSPFRSMPPDRLNASYGMLGQSPPLHRSPDFLLSYPPAPSCFPPDHLGYS corresponding toamino acids 862-989 of BAC86902 (SEQ ID NO:1445), which also correspondsto amino acids 1044-1171 of T08446_PEA_(—)1_P18 (SEQ ID NO:1370), and afifth amino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceAPQHPARRPTPPEPLYVNLALGPRGPSPASSSSSSPPAHPRSRSDPGPPVPRLPQKQRAPWGPRTPHRVPGPWGPPEPLLLYRAAPPAYGRGGELHRGSLYRNGGQRGEGAGPPPPYPTPSWSLHSEGQTRSYCcorresponding to amino acids 1172-1305 of T08446_PEA_(—)1_P18 (SEQ IDNO:1370), wherein said first amino acid sequence, second amino acidsequence, third amino acid sequence, fourth amino acid sequence andfifth amino acid sequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a head ofT08446_PEA_(—)1_P18 (SEQ ID NO:1370), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequenceMLSLSLCSHLWGPLILSALQARSTDSLDGPGEGSVQPLPTAGGPSVKGKPGKRLSAPRGPFPRLADCAHFHYENVDFGHIQLLLSPDREGPSLSGENELVFGVQVTCQGRSWPVLRSYDDFRSLDAHLHRCIFDRRFSCLPELPPPPEGARAAQ of T08446_PEA_(—)1_P18 (SEQ ID NO:1370).

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for an edge portion ofT08446_PEA_(—)1_P18 (SEQ ID NO:1370), comprising an amino acid sequencebeing at least 70%, optionally at least about 80%, preferably at leastabout 85%, more preferably at least about 90% and most preferably atleast about 95% homologous to the sequence encoding forQVSAQLRAGGGGRDAPEAAAQSPCSVPS, corresponding to T08446_PEA_(—)1_P18 (SEQID NO:1370).

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofT08446_PEA_(—)1_P18 (SEQ ID NO:1370), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequenceAPQHPARRPTPPEPLYVNLALGPRGPSPASSSSSSPPAHPRSRSDPGPPVPRLPQKQRAPWGPRTPHRVPGPWGPPEPLLLYRAAPPAYGRGGELHRGSLYRNGGQRGEGAGPPPPYPTPSWSLHSEGQTRSYC inT08446_PEA_(—)1_P18 (SEQ ID NO:1370).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forT11628_PEA_(—)1_P2 (SEQ ID NO:1376), comprising a first amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequenceMGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDE (SEQ ID NO:1735) corresponding to amino acids 1-55 of T11628_PEA_(—)1_P2 (SEQ IDNO:1376), and a second amino acid sequence being at least 90% homologoustoMKASEDLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKHPGDFGADAQGAMNKALELFRKDMASNYKELGFQG corresponding to amino acids 1-99 of Q8WVH6(SEQ ID NO:1450), which also corresponds to amino acids 56-154 ofT11628_PEA_(—)1_P2 (SEQ ID NO:1376), wherein said first amino acidsequence and second amino acid sequence are contiguous and in asequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a head ofT11628_PEA_(—)1_P2 (SEQ ID NO:1376), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequenceMGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDE (SEQ ID NO:1735) of T11628_PEA_(—)1_P2 (SEQ ID NO:1376).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forT11628_PEA_(—)1_P5 (SEQ ID NO:1377), comprising a first amino acidsequence being at least 90% homologous toMKASEDLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKHPGDFGADAQGAMNKALELFRKDMASNYKELGFQG corresponding to amino acids 56-154 ofMYG_HUMAN_V1 (SEQ ID NO:1449), which also corresponds to amino acids1-99 of T11628_PEA_(—)1_P5 (SEQ ID NO:1377).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forT11628_PEA_(—)1_P7 (SEQ ID NO:1378), comprising a first amino acidsequence being at least 90% homologous toMGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASEDLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKHPGDFGADAQGAMNKcorresponding to amino acids 1-134 of MYG_HUMAN_V1 (SEQ ID NO:1449),which also corresponds to amino acids 1-134 of T11628_PEA_(—)1_P7 (SEQID NO:1378), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence G corresponding to amino acids 135-135 of T11628_PEA_(—)1_P7 (SEQ ID NO:1378), wherein said first amino acid sequenceand second amino acid sequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forT11628_PEA_(—)1_P10 (SEQ ID NO:1379), comprising a first amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequenceMGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDE (SEQ ID NO:1735) corresponding to amino acids 1-55 of T11628_PEA_(—)1_P10 (SEQ IDNO:1379), and a second amino acid sequence being at least 90% homologoustoMKASEDLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKHPGDFGADAQGAMNKALELFRKDMASNYKELGFQG corresponding to amino acids 1-99 of Q8WVH6(SEQ ID NO:1450), which also corresponds to amino acids 56-154 ofT11628_PEA_(—)1_P10 (SEQ ID NO:1379), wherein said first amino acidsequence and second amino acid sequence are contiguous and in asequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a head ofT11628_PEA_(—)1_P10 (SEQ ID NO:1379), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequenceMGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDE (SEQ ID NO:1735) of T11628_PEA_(—)1_P10 (SEQ ID NO:1379).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P9 (SEQ ID NO:1385), comprising afirst amino acid sequence being at least 90% homologous toMASSTGDRSQAVRHGLRAKVLTLDGMNPRVRRVEYAVRGPIVQRALELEQELRQGVKKPFTEVIRANIGDAQAMGQRPITFLRQVLALCVNPDLLSSPNFPDDAKKRAERILQACGGHSLGAYSVSSGIQLIREDVARYIERRDGGIPADPNNVFLSTGASDAIVTVLKLLVAGEGHTRTGVLIPIPQYPLYSATLAELGAVQVDYYLDEERAWALDVAELHRALGQARDHCRPRALCVINPGNPTGQVQTRECIEAVIRFAFEERLFLLADEVcorresponding to amino acids 1-274 of ALAT_HUMAN_V1 (SEQ ID NO:1453),which also corresponds to amino acids 1-274 ofR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P9 (SEQ ID NO:1385), and a secondamino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceRGAGEREAGQQSAPVTPCALPGVPGQRVRRGFAVPLIQEGAHGDGAALRRAAGACLLPLHLQGLHGRVRAYEAGGGSRAMARPSSPDGPPPPPHLTWPCAGAGSAAAMWRW (SEQ ID NO: 1737)corresponding to amino acids 275-385 ofR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P9 (SEQ ID NO:1385), wherein saidfirst amino acid sequence and second amino acid sequence are contiguousand in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P9 (SEQ ID NO:1385), comprising apolypeptide being at least 70%, optionally at least about 80%,preferably at least about 85%, more preferably at least about 90% andmost preferably at least about 95% homologous to the sequenceRGAGEREAGQQSAPVTPCALPGVPGQRVRRGFAVPLIQEGAHGDGAALRRAAGACLLPLHLQGLHGRVRAYEAGGGSRAMARPSSPDGPPPPPHLTWPCAGAGSAAAMWRW (SEQ ID NO: 1737) inR35137_PEA_(—)1_PEA_(—)1_P9 (SEQ ID NO:1385).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P8 (SEQ ID NO:1386), comprising afirst amino acid sequence being at least 90% homologous toMASSTGDRSQAVRHGLRAKVLTLDGMNPRVRRVEYAVRGPIVQRALELEQELRQGVKKPFTEVIRANIGDAQAMGQRPITFLRQVLALCVNPDLLSSPNFPDDAKKRAERILQACGGHSLGAYSVSSGIQLIREDVARYIERRDGGIPADPNNVFLSTGASDAIVTVLKLLVAGEGHTRTGVLIPIPQYPLYSATLAELGAVQVDYYLDEERAWALDVAELHRALGQARDHCRPRALCVINPGNPTGQVQTRECIEAVIRFAFEERLFLLADEVYQDNVYAAGSQFHSFKKVLMEMGPPYAGQQELASFHSTSKGYMGEC corresponding to amino acids 1-320of ALAT_HUMAN_V1 (SEQ ID NO:1453), which also corresponds to amino acids1-320 of R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P8 (SEQ ID NO:1386), and asecond amino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceVRTRRVGARGPWPGPPRPMGHPLLRT (SEQ ID NO: 1738) corresponding to aminoacids 321-346 of R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P8 (SEQ ID NO:1386),wherein said first amino acid sequence and second amino acid sequenceare contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P8 (SEQ ID NO:1386), comprising apolypeptide being at least 70%, optionally at least about 80%,preferably at least about 85%, more preferably at least about 90% andmost preferably at least about 95% homologous to the sequenceVRTRRVGARGPWPGPPRPMGHPLLRT (SEQ ID NO: 1738) inR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P8 (SEQ ID NO:1386).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P11 (SEQ ID NO:1387), comprising afirst amino acid sequence being at least 90% homologous toMASSTGDRSQAVRHGLRAKVLTLDGMNPRVRRVEYAVRGPIVQRALELEQELRQGVKKPFTEVMANIGDAQAMGQRPITFLRQVLALCVNPDLLSSPNFPDDAKKRAERILQACGGHSLGAYSVSSGIQLIREDVARYIERRDGGIPADPNNVFLSTGASDAIVTVLKLLVAGEGHTRTGVLIPIPQYPLYSATLAELGAVQVDYYLDEERAWALDVAELHRALGQAR corresponding to amino acids 1-229 of ALAT_HUMAN_V1(SEQ ID NO:1453), which also corresponds to amino acids 1-229 ofR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P11 (SEQ ID NO:1387), and a secondamino acid sequence being at least 90% homologous toSGFGQREGTYHFRMTILPPLEKLRLLLEKLSRFHAKFTLEYS corresponding to amino acids455-496 of ALAT_HUMAN_V1 (SEQ ID NO:1453), which also corresponds toamino acids 230-271 of R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P11 (SEQ IDNO:1387), wherein said first amino acid sequence and second amino acidsequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for an edge portionof R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P11 (SEQ ID NO:1387), comprising apolypeptide having a length “n”, wherein n is at least about 10 aminoacids in length, optionally at least about 20 amino acids in length,preferably at least about 30 amino acids in length, more preferably atleast about 40 amino acids in length and most preferably at least about50 amino acids in length, wherein at least two amino acids comprise RS,having a structure as follows: a sequence starting from any of aminoacid numbers 229−x to 229; and ending at any of amino acid numbers230+((n−2)−x), in which x varies from 0 to n−2.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P2 (SEQ ID NO:1388), comprising afirst amino acid sequence being at least 90% homologous toMASSTGDRSQAVRHGLRAKVLTLDGMNPRVRRVEYAVRGPIVQRALELEQELRQGVKKPFTEVIRANIGDAQAMGQRPITFLRQVLALCVNPDLLSSPNFPDDAKKRAERILQACGGHSLGAYSVSSGIQUREDVARYIERRDGGIPADPNNVFLSTGASDAIVTVLKLLVAGEGHTRTGVLIPIPQYPLYSATLAELGAVQVDYYLDEERAWALDVAELHRALGQARDHCRPRALCVINPGNPTGQVQTRECIEAVIRFAFEERLFLLADEVcorresponding to amino acids 1-274 of ALAT_HUMAN_V1 (SEQ ID NO:1453),which also corresponds to amino acids 1-274 ofR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P2 (SEQ ID NO:1388), and a secondamino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceRGAGEREAGQQSAPVTPCALPGVPGQRVRRGFAVPLIQEGAHGDGAALRRAAGACLLPLHLQGLHGRVRVPRRLCGGGEHGRCSAAADAEADECAAVPAGARTGPAGPGGQPARAHRPLLCAVPG (SEQ ID NO:1739) corresponding to amino acids 275-399 ofR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P2 (SEQ ID NO:1388), wherein saidfirst amino acid sequence and second amino acid sequence are contiguousand in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P2 (SEQ ID NO:1388), comprising apolypeptide being at least 70%, optionally at least about 80%,preferably at least about 85%, more preferably at least about 90% andmost preferably at least about 95% homologous to the sequenceRGAGEREAGQQSAPVTPCALPGVPGQRVRRGFAVPLIQEGAHGDGAALRRAAGACLLPLHLQGLHGRVRVPRRLCGGGEHGRCSAAADAEADECAAVPAGARTGPAGPGGQPARAHRPLLCAVPG (SEQ ID NO:1739) in R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P2 (SEQ ID NO:1338).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P4 (SEQ ID NO:1389), comprising afirst amino acid sequence being at least 90% homologous toMASSTGDRSQAVRHGLRAKVLTLDGMNPRVRRVEYAVRGPIVQRALELEQELRQGVKKPFTEVIRANIGDAQAMGQRPITFLRQVLALCVNPDLLSSPNFPDDAKKRAERILQACGGHSLGAYSVSSGIQLIREDVARYIERRDGGIPADPNNVFLSTGASDAIVTVLKLLVAGEGHTRTGVLIPIPQYPLYSATLAELGAVQVDYYLDEERAWALDVAELHRALGQARDHCRPRALCVINPGNPTGQVQTRECIEAVIRFAFEERLFLLADEVYQDNVYAAGSQFHSFKKVLMEMGPPYAGQQELASFHSTSKGYMGECGFRGGYVEVVNMDAAVQQQMLKLMSVRLCPPVPGQALLDLVVSPPAPTDPSFAQFQAEKQAVLAELAAKAKLTEQVFNEAPGISCNPVQGAMYSFPRVQLPPRAVERAQELGLAPDMFFCLRLLEETGICVVPGSGFGQREGTYHFRMTILPPLEKLRLLLEKLSRFHAKFTLE corresponding to amino acids 1-494 of ALAT_HUMAN_V1 (SEQ ID NO:1453),which also corresponds to amino acids 1-494 ofR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P4 (SEQ ID NO:1389), and a secondamino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceSPGRLWSPLYLLLMPGGVGWGGCWAPASLQVPNKAVWQSDSKKEALAAAWPAPTCLPFLQA (SEQ IDNO: 1740) corresponding to amino acids 495-555 ofR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P4 (SEQ ID NO:1389), wherein saidfirst amino acid sequence and second amino acid sequence are contiguousand in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P4 (SEQ ID NO:1389), comprising apolypeptide being at least 70%, optionally at least about 80%,preferably at least about 85%, more preferably at least about 90% andmost preferably at least about 95% homologous to the sequenceSPGRLWSPLYLLLMPGGVGWGGCWAPASLQVPNKAVWQSDSKKEALAAAWPAPTCLPFLQA (SEQ IDNO: 1740) in R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P4 (SEQ ID NO:1389).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR11723_PEA_(—)1_P6 (SEQ ID NO:1410), comprising a first amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequenceMWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAGIMYRKSCASSAACLIASAGSPCRGLAPGREEQRALHKAGAVGGGVR (SEQ ID NO: 1741) correspondingto amino acids 1-110 of R11723_PEA_(—)1_P6 (SEQ ID NO:1410), and asecond amino acid sequence being at least 90% homologous toMYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRERQRKEKHSMRTQ corresponding to aminoacids 1-112 of Q8IXM0 (SEQ ID NO:1707), which also corresponds to aminoacids 111-222 of R11723_PEA_(—)1_P6 (SEQ ID NO:1410), wherein said firstand second amino acid sequences are contiguous and in a sequentialorder.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a head ofR11723_PEA_(—)1_P6 (SEQ ID NO:1410), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequenceMWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAGIMYRKSCASSAACLIASAGSPCRGLAPGREEORALHKAGAVGGGVR (SEQ ID NO: 1741) ofR11723_PEA_(—)1_p6 (SEQ ID NO:1410).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR11723_PEA_(—)1_P6 (SEQ ID NO:1410), comprising a first amino acidsequence being at least 90% homologous toMWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAGIMYRKSCASSAACLIASAG corresponding to amino acids 1-83 of Q96AC2 (SEQ IDNO:1708), which also corresponds to amino acids 1-83 of R11723_PEA_(—)1_P6 (SEQ ID NO:1410), and a second amino acid sequence being atleast 70%, optionally at least 80%, preferably at least 85%, morepreferably at least 90% and most preferably at least 95% homologous to apolypeptide having the sequenceSPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRERQRKEKHSMRTQ (SEQ ID NO: 1742) corresponding to amino acids 84-222 ofR11723_PEA_(—)1_P6 (SEQ ID NO:1410), wherein said first and second aminoacid sequences are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR11723_PEA_(—)1_P6 (SEQ ID NO:1410), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequenceSPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRERQRKEKHSMRTQ (SEQ ID NO: 1742) in R11723_PEA_(—)1_P6 (SEQ ID NO:1410).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR11723_PEA_(—)1_P6 (SEQ ID NO:1410), comprising a first amino acidsequence being at least 90% homologous toMWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAGIMYRKSCASSAACLIASAG corresponding to amino acids 1-83 of Q8N2G4 (SEQ IDNO:1709), which also corresponds to amino acids 1-83 ofR11723_PEA_(—)1_P6 (SEQ ID NO:1410), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequenceSPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRERQRKEKHSMRTQ (SEQ ID NO: 1742) corresponding to amino acids 84-222 ofR11723_PEA_(—)1_P6 (SEQ ID NO:1410), wherein said first and second aminoacid sequences are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR11723_PEA_(—)1_P6 (SEQ ID NO:1410), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequenceSPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRERQRKEKHSMRTQ (SEQ ID NO: 1742) in R11723_PEA_(—)1_P6 (SEQ ID NO:1410).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR11723_PEA_(—)1_P6 (SEQ ID NO:1410), comprising a first amino acidsequence being at least 90% homologous toMWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAGIMYRKSCASSAACLIASAG corresponding to amino acids 24-106 of BAC85518 (SEQ IDNO:1710), which also corresponds to amino acids 1-83 ofR11723_PEA_(—)1_P6 (SEQ ID NO:1410), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequenceSPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRERQRKEKHSMRTQ (SEQ ID NO: 1742) corresponding to amino acids 84-222 ofR11723_PEA_(—)1_P6 (SEQ ID NO:1410), wherein said first and second aminoacid sequences are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR11723_PEA_(—)1_P6 (SEQ ID NO:1410), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequenceSPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRERQRKEKHSMRTQ (SEQ ID NO: 1742) in R11723_PEA_(—)1_P6 (SEQ ID NO:1410).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR11723_PEA_(—)1_P7 (SEQ ID NO:1411), comprising a first amino acidsequence being at least 90% homologous toMWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAGcorresponding to amino acids 1-64 of Q96AC2 (SEQ ID NO:1708), which alsocorresponds to amino acids 1-64 of R11723_PEA_(—)1_P7 (SEQ ID NO:1411),and a second amino acid sequence being at least 70%, optionally at least80%, preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceSHCVTRLECSGTISAHCNLCLPGSNDHPT (SEQ ID NO: 1743) corresponding to aminoacids 65-93 of R11723_PEA_(—)1_P7 (SEQ ID NO:1411), wherein said firstand second amino acid sequences are contiguous and in a sequentialorder.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR11723_PEA_(—)1_P7 (SEQ ID NO:1411), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT (SEQ ID NO:1743) in R11723_PEA_(—)1_P7 (SEQ ID NO:1411).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR11723_PEA_(—)1_P7 (SEQ ID NO:1411), comprising a first amino acidsequence being at least 90% homologous toMWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAGcorresponding to amino acids 1-64 of Q8N2G4 (SEQ ID NO:1709), which alsocorresponds to amino acids 1-64 of R11723_PEA1_P7 (SEQ ID NO:1411), anda second amino acid sequence being at least 70%, optionally at least80%, preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceSHCVTRLECSGTISAHCNLCLPGSNDHPT (SEQ ID NO: 1743) corresponding to aminoacids 65-93 of R11723_PEA_(—)1_P7 (SEQ ID NO:1411), wherein said firstand second amino acid sequences are contiguous and in a sequentialorder.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR11723_PEA_(—)1_P7 (SEQ ID NO:1411), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT (SEQ ID NO:1743) in R11723_PEA_(—)1_P7 (SEQ ID NO:1411).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR11723_PEA_(—)1_P7 (SEQ ID NO:1411), comprising a first amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence MWVLG (SEQ ID NO: 1744)corresponding to amino acids 1-5 of R11723_PEA_(—)1_P7 (SEQ ID NO:1411),second amino acid sequence being at least 90% homologous toIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAGcorresponding to amino acids 22-80 of BAC85273, which also correspondsto amino acids 6-64 of R11723_PEA_(—)1_P7 (SEQ ID NO:1411), and a thirdamino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceSHCVTRLECSGTISAHCNLCLPGSNDHPT (SEQ ID NO: 1743) corresponding to aminoacids 65-93 of R11723_PEA_(—)1_P7 (SEQ ID NO:1411), wherein said first,second and third amino acid sequences are contiguous and in a sequentialorder.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a head ofR11723_PEA_(—)1_P7 (SEQ ID NO:1411), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence MWVLG (SEQ ID NO: 1744) ofR11723_PEA_(—)1_P7 (SEQ ID NO:1411).

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR11723_PEA_(—)1_P7 (SEQ ID NO:1411), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT (SEQ ID NO:1743) in R11723_PEA_(—)1_P7 (SEQ ID NO:1411).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR11723_PEA_(—)1_P7 (SEQ ID NO:1411), comprising a first amino acidsequence being at least 90% homologous toMWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAGcorresponding to amino acids 24-87 of BAC85518 (SEQ ID NO:1710), whichalso corresponds to amino acids 1-64 of R11723_PEA_(—)1_P7 (SEQ IDNO:1411), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT (SEQ ID NO: 1743)corresponding to amino acids 65-93 of R11723_PEA_(—)1_P7 (SEQ IDNO:1411), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR11723_PEA_(—)1_P7 (SEQ ID NO:1411), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT (SEQ ID NO:1743) in R11723_PEA_(—)1_P7 (SEQ ID NO:1411).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR11723_PEA_(—)1_P13 (SEQ ID NO:1412), comprising a first amino acidsequence being at least 90% homologous toMWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAcorresponding to amino acids 1-63 of Q96AC2 (SEQ ID NO:1708), which alsocorresponds to amino acids 1-63 of R11723_PEA_(—)1_P13 (SEQ ID NO:1412),and a second amino acid sequence being at least 70%, optionally at least80%, preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceDTKRTNTLLFEMRHFAKQLTT (SEQ ID NO: 1745) corresponding to amino acids64-84 of R11723_PEA_(—)1_P13 (SEQ ID NO:1412), wherein said first andsecond amino acid sequences are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR11723_PEA_(—)1_P13 (SEQ ID NO:1412), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence DTKRTNTLLFEMRHFAKQLTT (SEQ ID NO: 1745)in R11723_PEA_(—)1_P13 (SEQ ID NO:1412).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR11723_PEA_(—)1_P10 (SEQ ID NO:1413), comprising a first amino acidsequence being at least 90% homologous toMWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAcorresponding to amino acids 1-63 of Q96AC2 (SEQ ID NO:1708), which alsocorresponds to amino acids 1-63 of R11723_PEA_(—)1_P10 (SEQ ID NO:1413),and a second amino acid sequence being at least 70%, optionally at least80%, preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceDRVSLCHEAGVQWNNFSTLQPLPPRLK (SEQ ID NO: 1746) corresponding to aminoacids 64-90 of R11723_PEA_(—)1_P10 (SEQ ID NO:1413), wherein said firstand second amino acid sequences are contiguous and in a sequentialorder.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR11723_PEA_(—)1_P10 (SEQ ID NO:1413), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK (SEQ ID NO:1746) in R11723_PEA_(—)1_P10 (SEQ ID NO:1413).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR11723_PEA_(—)1_P10 (SEQ ID NO:1413), comprising a first amino acidsequence being at least 90% homologous toMWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAcorresponding to amino acids 1-63 of Q8N2G4 (SEQ ID NO:1709), which alsocorresponds to amino acids 1-63 of R11723_PEA_(—)1_P10 (SEQ ID NO:1413),and a second amino acid sequence being at least 70%, optionally at least80%, preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceDRVSLCHEAGVQWNNFSTLQPLPPRLK (SEQ ID NO: 1746) corresponding to aminoacids 64-90 of R11723_PEA_(—)1_P10 (SEQ ID NO:1413), wherein said firstand second amino acid sequences are contiguous and in a sequentialorder.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR11723_PEA_(—)1_P10 (SEQ ID NO:1413), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK (SEQ ID NO:1746) in R11723_PEA_(—)1_P10 (SEQ ID NO:1413).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR11723_PEA_(—)1_P10 (SEQ ID NO:1413), comprising a first amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence MWVLG (SEQ ID NO: 1744)corresponding to amino acids 1-5 of R11723_PEA_(—)1_P10 (SEQ IDNO:1413), second amino acid sequence being at least 90% homologous toIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSA correspondingto amino acids 22-79 of BAC85273, which also corresponds to amino acids6-63 of R11723_PEA_(—)1_P10 (SEQ ID NO:1413), and a third amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequenceDRVSLCHEAGVQWNNFSTLQPLPPRLK (SEQ ID NO: 1746) corresponding to aminoacids 64-90 of R11723_PEA_(—)1_P10 (SEQ ID NO:1413), wherein said first,second and third amino acid sequences are contiguous and in a sequentialorder.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a head ofR11723_PEA_(—)1_P10 (SEQ ID NO:1413), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence MWVLG (SEQ ID NO: 1744) ofR11723_PEA_(—)1_P10 (SEQ ID NO:1413).

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR11723_PEA_(—)1_P10 (SEQ ID NO:1413), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK (SEQ ID NO:1746) in R11723PEA_(—)1_P10 (SEQ ID NO:1413).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR11723_PEA_(—)1_P10 (SEQ ID NO:1413), comprising a first amino acidsequence being at least 90% homologous toMWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAcorresponding to amino acids 24-86 of BAC85518 (SEQ ID NO:1710), whichalso corresponds to amino acids 1-63 of R11723_PEA_(—)1_P10 (SEQ IDNO:1413), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK (SEQ ID NO: 1746)corresponding to amino acids 64-90 of R11723_PEA_(—)1_P10 (SEQ IDNO:1413), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR11723_PEA_(—)1_P10 (SEQ ID NO:1413), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK (SEQ ID NO:1746) in R11723PEA_(—)1_P10 (SEQ ID NO:1413).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR16276_PEA_(—)1_P7 (SEQ ID NO:1414), comprising a first amino acidsequence being at least 90% homologous toMQSVQSTSFCLRKQCLCLTFLLLHLLGQVAATQRCPPQCPG corresponding to amino acids1-41 of NOV_HUMAN (SEQ ID NO:1463), which also corresponds to aminoacids 1-41 of R16276PEA_(—)1_P7 (SEQ ID NO:1414), a bridging amino acidQ corresponding to amino acid 42 of R16276_PEA_(—)1_P7 (SEQ ID NO:1414),a second amino acid sequence being at least 90% homologous toCPATPPTCAPGVRAVLDGCSCCLVCARQRGESCSDLEPCDESSGLYCDRSADPSNQTGICTcorresponding to amino acids 43-103 of NOV_HUMAN (SEQ ID NO:1463), whichalso corresponds to amino acids 43-103 of R16276_PEA_(—)1_P7 (SEQ IDNO:1414), and a third amino acid sequence being at least 70%, optionallyat least 80%, preferably at least 85%, more preferably at least 90% andmost preferably at least 95% homologous to a polypeptide having thesequence GNPAPSAV (SEQ ID NO: 1748) corresponding to amino acids 104-111of R16276_PEA_(—)1_P7 (SEQ ID NO:1414), wherein said first amino acidsequence, bridging amino acid, second amino acid sequence and thirdamino acid sequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR16276_PEA_(—)1_P7 (SEQ ID NO:1414), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence GNPAPSAV (SEQ ID NO: 1748) inR16276_PEA_(—)1_P7 (SEQ ID NO:1414).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forR16276_PEA_(—)1_P7 (SEQ ID NO:1414), comprising a first amino acidsequence being at least 90% homologous toMQSVQSTSFCLRKQCLCLTFLLLHLLGQVAATQRCPPQCPG corresponding to amino acids1-41 of NOV_HUMAN (SEQ ID NO:1463), which also corresponds to aminoacids 1-41 of R16276_PEA_(—)1_P7 (SEQ ID NO:1414), a bridging amino acidQ corresponding to amino acid 42 of R16276_PEA_(—)1_P7 (SEQ ID NO:1414),a second amino acid sequence being at least 90% homologous toCPATPPTCAPGVRAVLDGCSCCLVCARQRGESCSDLEPCDESSGLYCDRSADPSNQTGICTcorresponding to amino acids 43-103 of NOV_HUMAN (SEQ ID NO:1463), whichalso corresponds to amino acids 43-103 of R16276_PEA_(—)1_P7 (SEQ IDNO:1414), and a third amino acid sequence being at least 70%, optionallyat least 80%, preferably at least 85%, more preferably at least 90% andmost preferably at least 95% homologous to a polypeptide having thesequence GNPAPSAV (SEQ ID NO: 1748) corresponding to amino acids 104-111of R16276_PEA_(—)1_P7 (SEQ ID NO:1414), wherein said first amino acidsequence, bridging amino acid, second amino acid sequence and thirdamino acid sequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofR16276_PEA_(—)1_P7 (SEQ ID NO:1414), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence GNPAPSAV (SEQ ID NO: 1748) inR16276_PEA_(—)1_P7 (SEQ ID NO:1414).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forHUMCEA_PEA_(—)1_P4 (SEQ ID NO:1380), comprising a first amino acidsequence being at least 90% homologous toMESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREHYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSISSNNSKPVEDKDAVAFTCEPETQDATYLWWVNNQSLPVSPRLQLSNGNRTLTLFNVTRNDTASYKCETQNPVSARRSDSVILNVL corresponding to amino acids 1-234 of CEA5_HUMAN(SEQ ID NO:1451), which also corresponds to amino acids 1-234 ofHUMCEA_PEA_(—)1_P4 (SEQ ID NO:1380), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequenceCEYICSSLAQAASPNPQGQRQDFSVPLRFKYTDPQPWTSRLSVTFCPRKTWADQVLTKNRRGGAASVLGGSGSTPYDGRNR (SEQ ID NO: 1749) corresponding to amino acids 235-315 ofHUMCEA_PEA_(—)1_P4 (SEQ ID NO:1380), wherein said first amino acidsequence and second amino acid sequence are contiguous and in asequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofHUMCEA_PEA_(—)1_P4 (SEQ ID NO:1380), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequenceCEYICSSLAQAASPNPQGQRQDFSVPLRFKYTDPQPWTSRLSVTFCPRKTWADQVLTKNRRGGAASVLGGSGSTPYDGRNR (SEQ ID NO: 1749) in HUMCEA_PEA_PEA_(—)1_P4 (SEQ IDNO:1380).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forHUMCEA_PEA_(—)1_P5 (SEQ ID NO:1381), comprising a first amino acidsequence being at least 90% homologous toMESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREIIYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSISSNNSKPVEDKDAVAFTCEPETQDATYLWWVNNQSLPVSPRLQLSNGNRTLTLFNVTRNDTASYKCETQNPVSARRSDSVILNVLYGPDAPTISPLNTSYRSGENLNLSCHAASNPPAQYSWFVNGTFQQSTQELFIPNITVNNSGSYTCQAHNSDTGLNRTTVTTITVYAEPPKPFITSNNSNPVEDEDAVALTCEPEIQNTTYLWWVNNQSLPVSPRLQLSNDNRTLTLLSVTRNDVGPYECGIQNELSVDHSDPVILNVLYGPDDPTISPSYTYYRPGVNLSLSCHAASNPPAQYSWLIDGNIQQHTQELFISNITEKNSGLYTCQANNSASGHSRTTVKTITVSAELPKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVNGQSLPVSPRLQLSNGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTPIISPPDSSYLSGANLNLSCHSASNPSPQYSWRINGIPQQHTQVLFIAKITPNNNGTYACFVSNLATGRNNSIVKSITVS corresponding to amino acids 1-675 ofCEA5_HUMAN (SEQ ID NO:1451), which also corresponds to amino acids 1-675of HUMCEA_PEA_(—)1_P5 (SEQ ID NO:1381), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequenceGKWLPGASASYSGVESIWFSPKSQEDIFFPSLCSMGTRKSQILS (SEQ ID NO: 1750)corresponding to amino acids 676-719 of HUMCEA_PEA_(—)1_P5 (SEQ IDNO:1381), wherein said first amino acid sequence and second amino acidsequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofHUMCEA_PEA_(—)1_P5 (SEQ ID NO:1381), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably atleast about 95%homologous to the sequence GKWLPGASASYSGVESIWFSPKSQEDIFFPSLCSMGTRKSQILS(SEQ ID NO: 1750) in HUMCEA_PEA_(—)1_P5 (SEQ ID NO:1381).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forHUMCEA_PEA_(—)1_P19 (SEQ ID NO:1383), comprising a first amino acidsequence being at least 90% homologous toMESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREHYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSISSNNSKPVEDKDAVAFTCEPETQDATYLWWVNNQSLPVSPRLQLSNGNRTLTLFNVTRNDTASYKCETQNPVSARRSDSVILN corresponding to amino acids 1-232 of CEA5_HUMAN (SEQID NO:1451), which also corresponds to amino acids 1-232 ofHUMCEA_PEA_(—)1_P19 (SEQ ID NO:1383), and a second amino acid sequencebeing at least 90% homologous toVLYGPDTPIISPPDSSYLSGANLNLSCHSASNPSPQYSWRINGIPQQHTQVLFIAKITPNNNGTYACFVSNLATGRNNSIVKSITVSASGTSPGLSAGATVGIMIGVLVGVALI corresponding to amino acids589-702 of CEA5_HUMAN (SEQ ID NO:1451), which also corresponds to aminoacids 233-346 of HUMCEA_PEA_(—)1_P19 (SEQ ID NO:1383), wherein saidfirst amino acid sequence and second amino acid sequence are contiguousand in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for an edge portionof HUMCEA_PEA_(—)1_P19 (SEQ ID NO:1383), comprising a polypeptide havinga length “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise NV, having a structureas follows: a sequence starting from any of amino acid numbers 232−x to232; and ending at any of amino acid numbers 233+((n−2)−x), in which xvaries from 0 to n−2.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forHUMCEA_PEA_(—)1_P20 (SEQ ID NO:1384), comprising a first amino acidsequence being at least 90% homologous toMESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREIIYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPcorresponding to amino acids 1-142 of CEA5_HUMAN (SEQ ID NO:1451), whichalso corresponds to amino acids 1-142 of HUMCEA_PEA_(—)1_P20 (SEQ IDNO:1384), and a second amino acid sequence being at least 90% homologoustoELPKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVNGQSLPVSPRLQLSNGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTPIISPPDSSYLSGANLNLSCHSASNPSPQYSWRINGIPQQHTQVLFIAKITPNNNGTYACFVSNLATGRNNSIVKSITVSASGTSPGLSAGATVGIMIGVLVGVALIcorresponding to amino acids 499-702 of CEA5_HUMAN (SEQ ID NO:1451),which also corresponds to amino acids 143-346 of HUMCEA_PEA_(—)1_P20(SEQ ID NO:1384), wherein said first amino acid sequence and secondamino acid sequence are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for an edge portionof HUMCEA_PEA_(—)1_P20 (SEQ ID NO:1384), comprising a polypeptide havinga length “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise PE, having a structureas follows: a sequence starting from any of amino acid numbers 142−x to142; and ending at any of amino acid numbers 143+((n−2)−x), in which xvaries from 0 to n−2.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forZ44808_PEA_(—)1_P5 (SEQ ID NO:1314), comprising a first amino acidsequence being at least 90% homologous toMLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGLYKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQLQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEERVVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNNDKSISVQELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ corresponding to amino acids 1-441 of SMO2_HUMAN(SEQ ID NO:1430), which also corresponds to amino acids 1-441 ofZ44808_PEA_(—)1_P5 (SEQ ID NO:1314), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence DAMVVSSRPKATTHRKSRTLSRR (SEQ ID NO:1751) corresponding to amino acids 442-464 of Z44808_PEA_(—)1_P5 (SEQ IDNO:1314), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofZ44808_PEA_(—)1_P5 (SEQ ID NO:1314), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence DAMVVSSRPKATTHRKSRTLSRR (SEQ ID NO: 1751)in Z44808_PEA_(—)1_P5 (SEQ ID NO:1314).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forZ44808_PEA_(—)1_P6 (SEQ ID NO:1315), comprising a first amino acidsequence being at least 90% homologous toMLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGLYKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQLQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEERVVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNNDKSISVQELMGCLGVAKEDGKADTKKRH corresponding to amino acids 1-428 of SMO2_HUMAN (SEQ IDNO:1430), which also corresponds to amino acids 1-428 ofZ44808_PEA_(—)1_P6 (SEQ ID NO:1315), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence RSKRNL (SEQ ID NO: 1752)corresponding to amino acids 429-434 of Z44808_PEA_(—)1_P6 (SEQ IDNO:1315), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofZ44808_PEA_(—)1_P6 (SEQ ID NO:1315), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence RSKRNL (SEQ ID NO: 1752) inZ44808_PEA_(—)1_P6 (SEQ ID NO:1315).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forZ44808_PEA_(—)1_P7 (SEQ ID NO:1316), comprising a first amino acidsequence being at least 90% homologous toMLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGLYKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQLQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEERVVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNNDKSISVQELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ corresponding to amino acids 1-441 of SMO2_HUMAN(SEQ ID NO:1430), which also corresponds to amino acids 1-441 ofZ44808_PEA_(—)1_P7 (SEQ ID NO:1316), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence LLWLRGKVSFYCF (SEQ ID NO: 1753)corresponding to amino acids 442-454 of Z44808_PEA_(—)1_P7 (SEQ IDNO:1316), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofZ44808_PEA_(—)1_P7 (SEQ ID NO:1316), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence LLWLRGKVSFYCF (SEQ ID NO: 1753) inZ44808_PEA_(—)1_P7 (SEQ ID NO:1316).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forZ44808_PEA_(—)1_P11 (SEQ ID NO:1317), comprising a first amino acidsequence being at least 90% homologous toMLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKT corresponding to amino acids 1-170 ofSMO2_HUMAN (SEQ ID NO:1430), which also corresponds to amino acids 1-170of Z44808_PEA_(—)1_P11 (SEQ ID NO:1317), and a second amino acidsequence being at least 90% homologous toDIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGLYKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQLQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEERVVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNNDKSISVQELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQPRKQG corresponding to aminoacids 188-446 of SMO2_HUMAN (SEQ ID NO:1430), which also corresponds toamino acids 171-429 of Z44808_PEA_(—)1_P11 (SEQ ID NO:1317), whereinsaid first and second amino acid sequences are contiguous and in asequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for an edge portionof Z44808_PEA_(—)1_P11 (SEQ ID NO:1317), comprising a polypeptide havinga length “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise TD, having a structureas follows: a sequence starting from any of amino acid numbers 170−x to−170; and ending at any of amino acid numbers 171+((n−2)−x), in which xvaries from 0 to n−2.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for H61775_P16 (SEQID NO:1281), comprising a first amino acid sequence being at least 90%homologous toMVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRPPLHVIEWLRFGFLLPIFIQFGLYSPRIDPDYVG corresponding to amino acids 11-93 of Q9P2J2 (SEQ IDNO:1694), which also corresponds to amino acids 1-83 of H61775_P16 (SEQID NO:1281), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequenceDCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCWRSSCSVTLQV(SEQ ID NO: 1754) corresponding to amino acids 84-152 of H61775_P16 (SEQID NO:1281), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of H61775_P16 (SEQID NO:1281), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequenceDCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCWRSSCSVTLQV(SEQ ID NO: 1754) in H61775_P16 (SEQ ID NO:1281).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for H61775_P16 (SEQID NO:1281), comprising a first amino acid sequence being at least 90%homologous toMVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRPPLHVIEWLRFGFLLPIFIQFGLYSPRIDPDYVG corresponding to amino acids 1-83 of AAQ88495 (SEQ IDNO:1695), which also corresponds to amino acids 1-83 of H61775_P16 (SEQID NO:1281), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequenceDCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCWRSSCSVTLQV(SEQ ID NO: 1754) corresponding to amino acids 84-152 of H61775_P16 (SEQID NO:1281), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of H61775_P16 (SEQID NO:1281), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequenceDCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCWRSSCSVTLQV(SEQ ID NO: 1754) in H61775_P16 (SEQ ID NO:1281).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for H61775_P17 (SEQID NO:1282), comprising a first amino acid sequence being at least 90%homologous toMVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRPPLHVIEWLRFGFLLPIFIQFGLYSPRIDPDYVG corresponding to amino acids 11-93 of Q9P2J2 (SEQ IDNO:1694), which also corresponds to amino acids 1-83 of H61775_P17 (SEQID NO:1282).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for H61775_P17 (SEQID NO:1282), comprising a first amino acid sequence being at least 90%homologous toMVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRPPLHVIEWLRFGFLLPIFIQFGLYSPRIDPDYVG corresponding to amino acids 1-83 of AAQ88495 (SEQ IDNO:1695), which also corresponds to amino acids 1-83 of H61775_P17 (SEQID NO:1282).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forM85491_PEA_(—)1_P13 (SEQ ID NO:1283), comprising a first amino acidsequence being at least 90% homologous toMALRRLGAALLLLPLLAAVEETLMDSTTATAELGWMVHPPSGWEEVSGYDENMNTIRTYQVCNVFESSQNNWLRTKFIRRRGAHRIHVEMKFSVRDCSSIPSVPGSCKETFNLYYYEADFDSATKTFPNWMENPWVKVDTIAADESFSQVDLGGRVMKINTEVRSFGPVSRSGFYLAFQDYGGCMSLIAVRVFYRKCPRIIQNGAIFQETLSGAESTSLVAARGSCIANAEEVDVPIKLYCNGDGEWLVPIGRCMCKAGFEAVENGTVCRGCPSGTFKANQGDEACTHCPINSRTTSEGATNCVCRNGYYRADLDPLDMPCTTIPSAPQAVISSVNETSLMLEWTPPRDSGGREDLVYNIICKSCGSGRGACTRCGDNVQYAPRQLGLTEPRIYISDLLAHTQYTFEIQAVNGVTDQSPFSPQFASVNITTNQAAPSAVSIMHQVSRTVDSITLSWSQPDQPNGVILDYELQYYEK corresponding toamino acids 1-476 of EPB2_HUMAN (SEQ ID NO:1417), which also correspondsto amino acids 1-476 of M85491_PEA_(—)1_P13 (SEQ ID NO:1283), and asecond amino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceVPIGWVLSPSPTSLRAPLPG (SEQ ID NO: 1755) corresponding to amino acids477-496 of M85491_PEA_(—)1_P13 (SEQ ID NO:1283), wherein said first andsecond amino acid sequences are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofM85491_PEA_(—)1_P13 (SEQ ID NO:1283), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence VPIGWVLSPSPTSLRAPLPG (SEQ ID NO: 1755) inM85491_PEA_(—)1_P13 (SEQ ID NO:1283).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forM85491_PEA_(—)1_P14 (SEQ ID NO:1284), comprising a first amino acidsequence being at least 90% homologous toMALRRLGAALLLLPLLAAVEETLMDSTTATAELGWMVHPPSGWEEVSGYDENMNTIRTYQVCNVFESSQNNWLRTKFIRRRGAHRIHVEMKFSVRDCSSIPSVPGSCKETFNLYYYEADFDSATKTFPNWMENPWVKVDTIAADESFSQVDLGGRVMKINTEVRSFGPVSRSGFYLAFQDYGGCMSLIAVRVFYRKCPRIIQNGAIFQETLSGAESTSLVAARGSCIANAEEVDVPIKLYCNGDGEWLVPIGRCMCKAGFEAVENGTVCRcorresponding to amino acids 1-270 of EPB2_HUMAN (SEQ ID NO:1417), whichalso corresponds to amino acids 1-270 of M85491_PEA_(—)1_P14 (SEQ IDNO:1284), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence ERQDLTMLSRLVLNSWPQMILPPQPPKVLEL (SEQ ID NO: 1756)corresponding to amino acids 271-301 of M85491_PEA_(—)1_P14 (SEQ IDNO:1284), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofM85491_PEA_(—)1_P14 (SEQ ID NO:1284), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence ERQDLTMLSRLVLNSWPQMILPPQPPKVLEL (SEQ IDNO: 1756) in M85491_PEA_(—)1_P14 (SEQ ID NO:1284).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for T39971_P6 (SEQ IDNO:1285), comprising a first amino acid sequence being at least 90%homologous toMAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKGcorresponding to amino acids 1-276 of VTNC_HUMAN (SEQ ID NO:1418), whichalso corresponds to amino acids 1-276 of T39971_P6 (SEQ ID NO:1285), anda second amino acid sequence being at least 70%, optionally at least80%, preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceTQGVVGD (SEQ ID NO: 1757) corresponding to amino acids 277-283 ofT39971_P6 (SEQ ID NO:1285), wherein said first and second amino acidsequences are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of T39971_P6 (SEQID NO:1285), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence TQGVVGD (SEQ ID NO: 1757) in T39971_P6 (SEQ ID NO:1285).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for 139971_P9 (SEQ IDNO:1286), comprising a first amino acid sequence being at least 90%homologous toMAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEECEGSSLSAVFEHFAMMQRDSWEDIFELLFWGRT corresponding to aminoacids 1-325 of VTNC_HUMAN (SEQ ID NO:1418), which also corresponds toamino acids 1-325 of T39971_P9 (SEQ ID NO:1286), and a second amino acidsequence being at least 90% homologous toSGMAPRPSLAKKQRFRHRNRKGYRSQRGHSRGRNQNSRRPSRATWLSLFSSEESNLGANNYDDYRMDWLVPATCEPIQSVFFFSGDKYYRVNLRTRRVDTVDPPYPRSIAQYWLGCPAPGHL corresponding toamino acids 357-478 of VTNC_HUMAN (SEQ ID NO:1418), which alsocorresponds to amino acids 326-447 of T39971_P9 (SEQ ID NO:1286),wherein said first and second amino acid sequences are contiguous and ina sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for an edge portionof T39971_P9 (SEQ ID NO:1286), comprising a polypeptide having a length“n”, wherein n is at least about 10 amino acids in length, optionally atleast about 20 amino acids in length, preferably at least about 30 aminoacids in length, more preferably at least about 40 amino acids in lengthand most preferably at least about 50 amino acids in length, wherein atleast two amino acids comprise TS, having a structure as follows: asequence starting from any of amino acid numbers 325−x to 325; andending at any of amino acid numbers 326+((n−2)−x), in which x variesfrom 0 to n−2.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for T39971_P11 (SEQID NO:1287), comprising a first amino acid sequence being at least 90%homologous toMAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFKGSQWRFEDGVLDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEECEGSSLSAVFEHFAMMQRDSWEDIFELLFWGRTS corresponding to aminoacids 1-326 of VTNC_HUMAN (SEQ ID NO:1418), which also corresponds toamino acids 1-326 of T39971_P11 (SEQ ID NO:1287), and a second aminoacid sequence being at least 90% homologous toDKYYRVNLRTRRVDTVDPPYPRSIAQYWLGCPAPGHL corresponding to amino acids442-478 of VTNC_HUMAN (SEQ ID NO:1418), which also corresponds to aminoacids 327-363 of T39971_P11 (SEQ ID NO:1287), wherein said first andsecond amino acid sequences are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for an edge portionof T39971_P11 (SEQ ID NO:1287), comprising a polypeptide having a length“n”, wherein n is at least about 10 amino acids in length, optionally atleast about 20 amino acids in length, preferably at least about 30 aminoacids in length, more preferably at least about 40 amino acids in lengthand most preferably at least about 50 amino acids in length, wherein atleast two amino acids comprise SD, having a structure as follows: asequence starting from any of amino acid numbers 326−x to 326; andending at any of amino acid numbers 327+((n−2)−x), in which x variesfrom 0 to n−2.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for T39971_P11 (SEQID NO:1287), comprising a first amino acid sequence being at least 90%homologous toMAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEECEGSSLSAVFEHFAMMQRDSWEDIFELLFWGRTS corresponding to aminoacids 1-326 of Q9BSH7, which also corresponds to amino acids 1-326 ofT39971_P11 (SEQ ID NO:1287), and a second amino acid sequence being atleast 90% homologous to DKYYRVNLRTRRVDTVDPPYPRSIAQYWLGCPAPGHLcorresponding to amino acids 442-478 of Q9BSH7, which also correspondsto amino acids 327-363 of T39971P11 (SEQ ID NO:1287), wherein said firstand second amino acid sequences are contiguous and in a sequentialorder.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for an edge portionof T39971_P11 (SEQ ID NO:1287), comprising a polypeptide having a length“n”, wherein n is at least about 10 amino acids in length, optionally atleast about 20 amino acids in length, preferably at least about 30 aminoacids in length, more preferably at least about 40 amino acids in lengthand most preferably at least about 50 amino acids in length, wherein atleast two amino acids comprise SD, having a structure as follows: asequence starting from any of amino acid numbers 326−x to 326; andending at any of amino acid numbers 327+((n−2)−x), in which x variesfrom 0 to n−2.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for T39971_P12 (SEQID NO:1288), comprising a first amino acid sequence being at least 90%homologous toMAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFK corresponding to amino acids 1-223 of VTNC_HUMAN (SEQ IDNO:1418), which also corresponds to amino acids 1-223 of T39971_P12 (SEQID NO:1288), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence VPGAVGQGRKHLGRV (SEQ ID NO: 1758) corresponding toamino acids 224-238 of T39971_P12 (SEQ ID NO:1288), wherein said firstand second amino acid sequences are contiguous and in a sequentialorder.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of T39971_P12 (SEQID NO:1288), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence VPGAVGQGRKHLGRV (SEQ ID NO: 1758) in T39971_P12 (SEQ IDNO:1288).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for 139971_P12 (SEQID NO:1288), comprising a first amino acid sequence being at least 90%homologous toMAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFK corresponding to amino acids 1-223 of Q9BSH7, which alsocorresponds to amino acids 1-223 of T39971_P12 (SEQ ID NO:1288), and asecond amino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceVPGAVGQGRKHLGRV (SEQ ID NO: 1758) corresponding to amino acids 224-238of T39971_P12 (SEQ ID NO:1288), wherein said first and second amino acidsequences are contiguous in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of T39971_P12 (SEQID NO:1288), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence VPGAVGQGRKHLGRV (SEQ ID NO: 1758) in T39971_P12 (SEQ IDNO:1288).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forZ21368_PEA_(—)1_P2 (SEQ ID NO:1289), comprising a first amino acidsequence being at least 90% homologous toMKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYVHNHNVYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRTAFFGKYLNEYNGSYIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAKDYFTDLITNESINYFKMSKRMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYNYAPNMDKHWIMQYTGPMLPIHMEFTNILQRKRLQTLMSVDDSVERLYNMLVETGELENTYIIYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVEPGSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNRFRTNKKAKIWRDTFLVERGKFLRKKEESSKNIQQSNHLPKYERVKELCQQARYQTACEQPGQKWQCIEDTSGKLRIHKCKGPSDLLTVRQSTRNLYARGFHDKDKECSCRESGYRASRSQRKSQRQFLRNQGTPKYKPRFVHTRQTRSLSVEFEGEIYDINLEEEEELQVLQPRNIAKRHDEGHKGPRDLQASSGGNRGRMLADSSNAVGPPTTVRVTHKCFILPNDSIHCERELYQSARAWKDHKAYIDKEIEALQDKIKNLREVRGHLKRRKPEECSCSKQSYYNKEKGVKKQEKLKSHLHPFKEAAQEVDSKLQLFKENNRRRKICERKEKRRQRKGEECSLPGLTCFTHDNNHWQTAPFWN correspondingto amino acids 1-761 of SUL1_HUMAN (SEQ ID NO:1419), which alsocorresponds to amino acids 1-761 of Z21368_PEA_(—)1_P2 (SEQ ID NO:1289),and a second amino acid sequence being at least 70%, optionally at least80%, preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequencePHKYSAHGRTRHFESATRTTNGAQKLSRI (SEQ ID NO: 1759) corresponding to aminoacids 762-790 of Z21368_PEA_(—)1_P2 (SEQ ID NO:1289), wherein said firstand second amino acid sequences are contiguous and in a sequentialorder.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofZ21368_PEA_(—)1_P2 (SEQ ID NO:1289), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence PHKYSAHGRTRHFESATRTTNGAQKLSRI (SEQ ID NO:1759) in Z21368_PEA_(—)1_P2 (SEQ ID NO:1289).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forZ21368_PEA_(—)1_P5 (SEQ ID NO:1290), comprising a first amino acidsequence being at least 90% homologous toMKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVEL correspondingto amino acids 1-57 of Q7Z2W2 (SEQ ID NO:1697), which also correspondsto amino acids 1-57 of Z21368_PEA_(—)1_P5 (SEQ ID NO:1290), secondbridging amino acid sequence comprising A, and a third amino acidsequence being at least 90% homologous toFFGKYLNEYNGSYIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAKDYFTDLITNESINYFKMSKRMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYNYAPNMDKHWIMQYTGPMLPIHMEFTNILQRKRLQTLMSVDDSVERLYNMLVETGELENTYIIYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVEPGSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNRFRTNKKAKIWRDTFLVERGKFLRKKEESSKNIQQSNHLPKYERVKELCQQARYQTACEQPGQKWQCIEDTSGKLRIHKCKGPSDLLTVRQSTRNLYARGFHDKDKECSCRESGYRASRSQRKSQRQFLRNQGTPKYKPRFVHTRQTRSLSVEFEGEIYDINLEEEEELQVLQPRNIAKRHDEGHKGPRDLQASSGGNRGRMLADSSNAVGPPTTVRVTHKCFILPNDSIHCERELYQSARAWKDHKAYIDKEIEALQDKIKNLREVRGHLKRRKPEECSCSKQSYYNKEKGVKKQEKLKSHLHPFKEAAQEVDSKLQLFKENNRRRKKERKEKRRQRKGEECSLPGLTCFTHDNNHWQTAPFWNLGSFCACTSSNNNTYWCLRTVNETHNFLFCEFATGFLEYFDMNTDPYQLTNTVHTVERGILNQLHVQLMELRSCQGYKQCNPRPKNLDVGNKDGGSYDLHRGQLWDGWEG corresponding to amino acids 139-871 ofQ7Z2W2 (SEQ ID NO:1697), which also corresponds to amino acids 59-791 ofZ21368_PEA_(—)1_P5 (SEQ ID NO:1290), wherein said first, second andthird amino acid sequences are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for an edge portion ofZ21368_PEA_(—)1_P5 (SEQ ID NO:1290), comprising a polypeptide having alength “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise LAF having a structureas follows (numbering according to Z21368_PEA_(—)1_P5 (SEQ ID NO:1290)):a sequence starting from any of amino acid numbers 57−x to 57; andending at any of amino acid numbers 59+((n−2)−x), in which x varies from0 to n−2.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forZ21368_PEA_(—)1_P5 (SEQ ID NO:1290), comprising a first amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequenceMKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVELAFFGKYLNEYNGSYIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAKDYFTDLITNESINYFKMSKRMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYNYAPNMDKHWIMQYTGPMLPIHMEFTNILQRKRLQTLMSVDDSVERLYNMLVETGELENTYIIYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVEPGSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNRFRTNKKAKIWRDTFLVERGKFLRKKEESSKNIQQSNHLPKYERVKELCQQARYQTACEQPGQKWQCIEDTSGKLRIHKCKGPSDLLTVRQSTRNLYARGFHDKDKECSCRESGYRASRSQRKSQRQFLRNQGTPKYKPRFVHTRQTRSLSVEFEGEIYDINLEEEEELQVLQPRNIAKRHDEGHKGPRDLQASSGGNRGRMLADSSNAVGPPTTVRVTHKCFILPNDSIHCERELYQSARAWKDHKAYIDKEIEALQDKIKNLREVRGHLKRRKPEECSCSKQSYYNKEKGVKKQEKLKSHLHPFKEAAQEVDSKLQLFKENNRRRKKERKEKRRQRKGEECSLPGLTCFTHDNNHWQTAPFWNLGSFCACTSSNNNTYWCLRTVNETHNFLFCEFATGFLEYFDMNTDPYQLTNTVHTVERGILNQLHVQLME (SEQ ID NO: 1760)corresponding to amino acids 1-751 of Z21368_PEA_(—)1_P5 (SEQ IDNO:1290), and a second amino acid sequence being at least 90% homologousto LRSCQGYKQCNPRPKNLDVGNKDGGSYDLHRGQLWDGWEG corresponding to amino acids1-40 of AAH12997 (SEQ ID NO:1698), which also corresponds to amino acids752-791 of Z21368 _PEA_(—)1_P5 (SEQ ID NO:1290), wherein said first andsecond amino acid sequences are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a head ofZ21368_PEA_(—)1_P5 (SEQ ID NO:1290), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequenceMKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNILLVLTDDQDVELAFFGKYLNEYNGSYIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAKDYFTDLITNESINYFKMSKRMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYNYAPNMDKHWIMQYTGPMLPIHMEFTNILQRKRQTLMSVDDSVERLYNMLVETGELENTYIIYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVEPGSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNRFRTNKKAKIWRDTFLVERGKFLRKKEESSKNIQQSNHLPKYERVKELCQQARYQTACEQPGQKWQCIEDTSGKLRIHKCKGPSDLLTVRQSTRNLYARGFHDKDKECSCRESGYRASRSQRKSQRQFLRNQGTPKYKPRFVHTRQTRSLSVEFEGEIYDINLEEEEELQVLQPRNIAKRHDEGHKGPRDLQASSGGNRGRMLADSSNAVGPPTTVRVTHKCFILPNDSIHCERELYQSARAWKDHKAYIDKEIEALQDKIKNLREVRGHLKRRKPEECSCSKQSYYNKEKGVKKQEKLKSHLHPFKEAAQEVDSKLQLFKENNRRRKKERKEKRRQRKGEECSLPGLTCFTHDNNHWQTAPFWNLGSFCACTSSNNNTYWCLRTVNETHNFLFCEFATGFLEYFDMNTDPYQLTNTVHTVERGILNQLHVQLME (SEQ ID NO: 1760) ofZ21368_PEA_(—)1_P5 (SEQ ID NO:1290).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forZ21368_PEA_(—)1_P5 (SEQ ID NO:1290), comprising a first amino acidsequence being at least 90% homologous toMKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVEL correspondingto amino acids 1-57 of SUL1_HUMAN (SEQ ID NO:1419), which alsocorresponds to amino acids 1-57 of Z21368_PEA1_P5 (SEQ ID NO:1290), anda second amino acid sequence being at least 90% homologous toAFFGKYLNEYNGSYIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAKDYFTDLITNESINYFKMSKRMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYNYAPNMDKHWIMQYTGPMLPIHMEFTNILQRKRLQTLMSVDDSVERLYNMLVETGELENTYHYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVEPGSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNRFRTNKKAKIWRDTFLVERGKFLRKKEESSKNIQQSNHLPKYERVKELCQQARYQTACEQPGQKWQCIEDTSGKLRIHKCKGPSDLLTVRQSTRNLYARGFHDKDKECSCRESGYRASRSQRKSQRQFLRNQGTPKYKPRFVHTRQTRSLSVEFEGEIYDINLEEEEELQVLQPRNIAKRHDEGHKGPRDLQASSGGNRGRMLADSSNAVGPPTTVRVTHKCFILPNDSIHCERELYQSARAWKDHKAYIDKEIEALQDKIKNLREVRGHLKRRKPEECSCSKQSYYNKEKGVKKQEKLKSHLHPFKEAAQEVDSKLQLFKENNRRRKKERKEKRRQRKGEECSLPGLTCFTHDNNHWQTAPFWNLGSFCACTSSNNNTYWCLRTVNETHNFLFCEFATGFLEYFDMNTDPYQLTNTVHTVERGILNQLHVQLMELRSCQGYKQCNPRPKNLDVGNKDGGSYDLHRGQLWDGWEG corresponding to amino acids 138-871 ofSUL1_HUMAN (SEQ ID NO:1419), which also corresponds to amino acids58-791 of Z21368_PEA_(—)1_P5 (SEQ ID NO:1290), wherein said first andsecond amino acid sequences are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for an edge portionof Z21368_PEA_(—)1_P5 (SEQ ID NO:1290), comprising a polypeptide havinga length “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise LA, having a structureas follows: a sequence starting from any of amino acid numbers 57−x to57; and ending at any of amino acid numbers 58+((n−2)−x), in which xvaries from 0 to n−2.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forZ21368_PEA_(—)1_P15 (SEQ ID NO:1291), comprising a first amino acidsequence being at least 90% homologous toMKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYVHNHNVYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRTAFFGKYLNEYNGSYIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAKDYFTDLITNESINYFKMSKRMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYNYAPNMDKHWIMQYTGPMLPIHMEFTNILQRKRLQTLMSVDDSVERLYNMLVETGELENTYHYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVEPGSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNRFRTNKKAKIWRDTFLVERGcorresponding to amino acids 1-416 of SUL1_HUMAN (SEQ ID NO:1419), whichalso corresponds to amino acids 1-416 of Z21368_PEA_(—)1_P15 (SEQ IDNO:1291).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forZ21368_PEA_(—)1_P16 (SEQ ID NO:1292), comprising a first amino acidsequence being at least 90% homologous toMKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYVHNHNVYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRTAFFGKYLNEYNGSYIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAKDYFTDLITNESINYFKMSKRMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYNYAPNMDKHWIMQYTGPMLPIHMEFTNILQRKRLQTLMSVDDSVERLYNMLVETGELENTYIIYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVEPGSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNR corresponding to aminoacids 1-397 of SUL1_HUMAN (SEQ ID NO:1419), which also corresponds toamino acids 1-397 of Z21368_PEA_(—)1_P16 (SEQ ID NO:1292), and a secondamino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceCVIVPPLSQPQIH (SEQ ID NO: 1761) corresponding to amino acids 398-410 ofZ21368_PEA_(—)1_P16 (SEQ ID NO:1292), wherein said first and secondamino acid sequences are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofZ21368_PEA_(—)1_P16 (SEQ ID NO:1292), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence CVIVPPLSQPQIH (SEQ ID NO: 1761) inZ21368_PEA_(—)1_P16 (SEQ ID NO:1292).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forZ21368_PEA_(—)1_P22 (SEQ ID NO:1293), comprising a first amino acidsequence being at least 90% homologous toMKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYVHNHNVYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRTAFFGKYLNEYNGSYIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAK corresponding to aminoacids 1-188 of SUL1_HUMAN (SEQ ID NO:1419), which also corresponds toamino acids 1-188 of Z21368 _PEA_(—)1_P22 (SEQ ID NO:1293), and a secondamino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceARYDGDQPRCAPRPRGLSPTVF (SEQ ID NO: 1762) corresponding to amino acids189-210 of Z21368 _PEA_(—)1_P22 (SEQ ID NO:1293), wherein said first andsecond amino acid sequences are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofZ21368_PEA_(—)1_P22 (SEQ ID NO:1293), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence ARYDGDQPRCAPRPRGLSPTVF (SEQ ID NO: 1762)in Z21368_PEA_(—)1_P22 (SEQ ID NO:1293).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forZ21368_PEA_(—)1_P23 (SEQ ID NO:1294), comprising a first amino acidsequence being at least 90% homologous toMKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYVHNHNVYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRTcorresponding to amino acids 1-137 of Q7Z2W2 (SEQ ID NO:1697), whichalso corresponds to amino acids 1-137 of Z21368_PEA_(—)1_P23 (SEQ IDNO:1294), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence GLLHRLNH (SEQ ID NO: 1763) corresponding to aminoacids 138-145 of Z21368_PEA_(—)1_P23 (SEQ ID NO:1294), wherein saidfirst and second amino acid sequences are contiguous and in a sequentialorder.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofZ21368_PEA_(—)1_P23 (SEQ ID NO:1294), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence GLLHRLNH (SEQ ID NO: 1763) inZ21368_PEA_(—)1_P23 (SEQ ID NO:1294).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forZ21368_PEA_(—)1_P23 (SEQ ID NO:1294), comprising a first amino acidsequence being at least 90% homologous toMKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYVHNHNVYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRTcorresponding to amino acids 1-137 of SUL1_HUMAN (SEQ ID NO:1419), whichalso corresponds to amino acids 1-137 of Z21368_PEA_(—)1_P23 (SEQ IDNO:1294), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence GLLHRLNH (SEQ ID NO: 1763) corresponding to aminoacids 138-145 of Z21368_PEA_(—)1_P23 (SEQ ID NO:1294), wherein saidfirst and second amino acid sequences are contiguous and in a sequentialorder.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofZ21368_PEA_(—)1_P23 (SEQ ID NO:1294), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence GLLHRLNH (SEQ ID NO: 1763) inZ21368_PEA_(—)1_P23 (SEQ ID NO:1294).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for HUMGRP5E_P4 (SEQID NO:1299), comprising a first amino acid sequence being at least 90%homologous toMRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLMGKKSTGESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQPKALGNQQPSWDSEDSSNFKDVGSKGK correspondingto amino acids 1-127 of GRP_HUMAN (SEQ ID NO:1421), which alsocorresponds to amino acids 1-127 of HUMGRP5E_P4 (SEQ ID NO:1299), and asecond amino acid sequence being at least 90% homologous toGSQREGRNPQLNQQ corresponding to amino acids 135-148 of GRP_HUMAN (SEQ IDNO:1421), which also corresponds to amino acids 128-141 of HUMGRP5E_P4(SEQ ID NO:1299), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for an edge portionof HUMGRP5E_P4 (SEQ ID NO:1299), comprising a polypeptide having alength “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise KG, having a structureas follows: a sequence starting from any of amino acid numbers 127−x to127; and ending at any of amino acid numbers 128+((n−2)−x), in which xvaries from 0 to n−2.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for HUMGRP5E_P5 (SEQID NO:1300), comprising a first amino acid sequence being at least 90%homologous toMRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLMGKKSTGESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQPKALGNQQPSWDSEDSSNFKDVGSKGK correspondingto amino acids 1-127 of GRP_HUMAN (SEQ ID NO:1421), which alsocorresponds to amino acids 1-127 of HUMGRP5E_P5 (SEQ ID NO:1300), and asecond amino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceDSLLQVLNVKEGTPS (SEQ ID NO: 1764) corresponding to amino acids 128-142of HUMGRP5E_P5 (SEQ ID NO:1300), wherein said first and second aminoacid sequences are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of HUMGRP5E_P5 (SEQID NO:1300), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence DSLLQVLNVKEGTPS (SEQ ID NO: 1764) in HUMGRP5E_P5 (SEQ IDNO:1300).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forD56406_PEA_(—)1_P2 (SEQ ID NO:1301), comprising a first amino acidsequence being at least 90% homologous toMMAGMKIQLVCMLLLAFSSWSLCSDSEEEMKALEADFLTNMHTSKISKAHVPSWKMTLLNVCSLVNNLNSPAEETGEVHEEELVARRKLPTALDGFSLEAMLTIYQLHKICHSRAFQHWE corresponding toamino acids 1-120 of NEUT_HUMAN (SEQ ID NO:1422), which also correspondsto amino acids 1-120 of D56406_PEA_(—)1_P2 (SEQ ID NO:1301), secondamino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceARWLTPVIPALWEAETGGSRGQEMETIPANT (SEQ ID NO: 1773) corresponding to aminoacids 121-151 of D56406_PEA_(—)1_P2 (SEQ ID NO:1301), and a third aminoacid sequence being at least 90% homologous toLIQEDILDTGNDKNGKEEVIKRKIPYILKRQLYENKPRRPYILKRDSYYY corresponding toamino acids 121-170 of NEUT_HUMAN (SEQ ID NO:1422), which alsocorresponds to amino acids 152-201 of D56406_PEA_(—)1_P2 (SEQ IDNO:1301), wherein said first, second and third amino acid sequences arecontiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for an edge portion ofD56406_PEA_(—)1_P2 (SEQ ID NO:1301), comprising an amino acid sequencebeing at least 70%, optionally at least about 80%, preferably at leastabout 85%, more preferably at least about 90% and most preferably atleast about 95% homologous to the sequence encoding forARWLTPVIPALWEAETGGSRGQEMETIPANT (SEQ ID NO: 1773), corresponding toD56406_PEA_(—)1_P2 (SEQ ID NO:1301).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forD56406_PEA_(—)1_P5 (SEQ ID NO:1302), comprising a first amino acidsequence being at least 90% homologous to MMAGMKIQLVCMLLLAFSSWSLCcorresponding to amino acids 1-23 of NEUT_HUMAN (SEQ ID NO:1422), whichalso corresponds to amino acids 1-23 of D56406_PEA_(—)1_P5 (SEQ IDNO:1302), and a second amino acid sequence being at least 90% homologoustoSEEEMKALEADFLTNMHTSKISKAHVPSWKMTLLNVCSLVNNLNSPAEETGEVHEEELVARRKLPTALDGFSLEAMLTIYQLHKICHSRAFQHWELIQEDILDTGNDKNGKEEVIKRKIPYILKRQLYENKPRRPYILKRDSYYY corresponding to amino acids 26-170 of NEUT_HUMAN (SEQ ID NO:1422),which also corresponds to amino acids 24-168 of D56406_PEA_(—)1_P5 (SEQID NO:1302), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for an edge portionof D56406_PEA_(—)1_P5 (SEQ ID NO:1302), comprising a polypeptide havinga length “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise CS, having a structureas follows: a sequence starting from any of amino acid numbers 23−x to24; and ending at any of amino acid numbers+((n−2)−x), in which x variesfrom 0 to n−2.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forD56406_PEA_(—)1_P6 (SEQ ID NO:1303), comprising a first amino acidsequence being at least 90% homologous toMMAGMKIQLVCMLLLAFSSWSLCSDSEEEMKALEADFLTNMHTSK corresponding to aminoacids 1-45 of NEUT_HUMAN (SEQ ID NO:1422), which also corresponds toamino acids 1-45 of D56406_PEA_(—)1_P6 (SEQ ID NO:1303), and a secondamino acid sequence being at least 90% homologous toLIQEDILDTGNDKNGKEEVIKRKIPYILKRQLYENKPRRPYILKRDSYYY corresponding toamino acids 121-170 of NEUT_HUMAN (SEQ ID NO:1422), which alsocorresponds to amino acids 46-95 of D56406_PEA_(—)1_P6 (SEQ ID NO:1303),wherein said first and second amino acid sequences are contiguous and ina sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for an edge portionof D56406_PEA_(—)1_P6 (SEQ ID NO:1303), comprising a polypeptide havinga length “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise KL, having a structureas follows: a sequence starting from any of amino acid numbers 45−x to46; and ending at any of amino acid numbers 46+((n−2)−x), in which xvaries from 0 to n−2.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forF05068_PEA_(—)1_P7 (SEQ ID NO:1304), comprising a first amino acidsequence being at least 90% homologous toMKLVSVALMYLGSLAFLGADTARLDVASEFRKK corresponding to amino acids 1-33 ofADML_HUMAN (SEQ ID NO:1423), which also corresponds to amino acids 1-33of F05068_PEA_(—)1_P7 (SEQ ID NO:1304).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forF05068_PEA_(—)1_P8 (SEQ ID NO:1305), comprising a first amino acidsequence being at least 90% homologous toMKLVSVALMYLGSLAFLGADTARLDVASEFRKKWNKWALSRGKRELRMSSSYPTGLADVKAGPAQTLIRPQDMKGASRSPED corresponding to amino acids 1-82 of ADML_HUMAN (SEQ IDNO:1423), which also corresponds to amino acids 1-82 ofF05068_PEA_(—)1_P8 (SEQ ID NO:1305), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence R corresponding to amino acids83-83 of F05068_PEA_(—)1_P8 (SEQ ID NO:1305), wherein said first andsecond amino acid sequences are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for H14624_P15 (SEQID NO:1306), comprising a first amino acid sequence being at least 90%homologous toMLQGPGSLLLLFLASHCCLGSARGLFLFGQPDFSYKRSNCKPIPANLQLCHGIEYQNMRLPNLLGHETMKEVLEQAGAWIPLVMKQCHPDTKKFLCSLFAPVCLDDLDETIQPCHSLCVQVKDRCAPVMSAFGFPWPDMLECDRFPQDNDLCIPLASSDHLLPATEE corresponding to amino acids 1-167 of Q9HAP5(SEQ ID NO:1701), which also corresponds to amino acids 1-167 ofH14624_P15 (SEQ ID NO:1306), and a second amino acid sequence being atleast 70%, optionally at least 80%, preferably at least 85%, morepreferably at least 90% and most preferably at least 95% homologous to apolypeptide having the sequence GKPSLLLPHSLLG (SEQ ID NO: 1765)corresponding to amino acids 168-180 of H14624_P15 (SEQ ID NO:1306),wherein said first and second amino acid sequences are contiguous and ina sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of H14624_P15 (SEQID NO:1306), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence GKPSLLLPHSLLG (SEQ ID NO: 1765) in H14624_P15 (SEQ ID NO:1306).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forH38804_PEA_(—)1_P5 (SEQ ID NO:1307), comprising a first amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequenceMGRVRTLAGECSAQAQAQSLLAVVLSAPPSGGTPSARLSVRSPSPRDPWGLWAPVLQ (SEQ ID NO:1766) corresponding to amino acids 1-57 of H38804_PEA_(—)1_P5 (SEQ IDNO:1307), and a second amino acid sequence being at least 90% homologoustoMTGSNEFKLNQPPEDGISSVKFSPNTSQFLLVSSWDTSVRLYDVPANSMRLKYQHTGAVLDCAFYDPTHAWSGGLDHQLKMHDLNTDQENLVGTHDAPIRCVEYCPEVNVMVTGSWDQTVKLWDPRTPCNAGTFSQPEKVYTLSVSGDRLIVGTAGRRVLVWDLRNMGYVQQRRESSLKYQTRCIRAFPNKQGYVLSSIEGRVAVEYLDPSPEVQKKKYAFKCHRLKENNIEQIYPVNAISFHNIHNTFATGGSDGFVNIWDPFNKKRLCQFHRYPTSIASLAFSNDGTTLAIASSYMYEMDDTEHPEDGIFIRQVTDAETKPK corresponding to aminoacids 1-324 of BUB3_HUMAN (SEQ ID NO:1424), which also corresponds toamino acids 58-381 of H38804_PEA_(—)1_P5 (SEQ ID NO:1307), wherein saidfirst and second amino acid sequences are contiguous and in a sequentialorder.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a head ofH38804_PEA_(—)1_P5 (SEQ ID NO:1307), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequenceMGRVRTLAGECSAQAQAQSLLAVVLSAPPSGGTPSARLSVRSPSPRDPWGLWAPVLQ (SEQ ID NO:1766) of H38804_PEA_(—)1_P5 (SEQ ID NO:1307).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forH38804_PEA_(—)1_P17 (SEQ ID NO:1308), comprising a first amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequenceMGRVRTLAGECSAQAQAQSLLAVVLSAPPSGGTPSARLSVRSPSPRDPWGLWAPVLQ (SEQ ID NO:1766) corresponding to amino acids 1-57 of H38804_PEA_(—)1_P17 (SEQ IDNO:1308), and a second amino acid sequence being at least 90% homologoustoMTGSNEFKLNQPPEDGISSVKFSPNTSQFLLVSSWDTSVRLYDVPANSMRLKYQHTGAVLDCAFYDPTHAWSGGLDHQLKMHDLNTDQENLVGTHDAPIRCVEYCPEVNVMVTGSWDQTVKLWDPRTPCNAGTFSQPEKVYTLSVSGDRLIVGTAGRRVLVWDLRNMGYVQQRRESSLKYQTRCIRAFPNKQGYVLSSIEGRVAVEYLDPSPEVQKKKYAFKCHRLKENNIEQIYPVNAISFHNIHNTFATGGSDGFVNIWDPFNKKRLCQFHRYPTSIASLAFSNDGTTLAIASSYMYEMDDTEHPEDGIFIRQVTDAETKPKSPCT corresponding to aminoacids 1-328 of BUB3_HUMAN (SEQ ID NO:1424), which also corresponds toamino acids 58-385 of H38804_PEA_(—)1_P17 (SEQ ID NO:1308), wherein saidfirst and second amino acid sequences are contiguous and in a sequentialorder.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a head ofH38804_PEA_(—)1_P17 (SEQ ID NO:1308), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequenceMGRVRTLAGECSAQAQAQSLLAVVLSAPPSGGTPSARLSVRSPSPRDPWGLWAPVLQ (SEQ ID NO:1766) of H38804_PEA1_P17 (SEQ ID NO:1308).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for HSENA78_P2 (SEQID NO:1309), comprising a first amino acid sequence being at least 90%homologous toMSLLSSRAARVPGPSSSLCALLVLLLLLTQPGPIASAGPAAAVLRELRCVCLQTTQGVHPKMISNLQVFAIGPQCSKVEVV corresponding to amino acids 1-81 of SZ05_HUMAN (SEQ IDNO:1425), which also corresponds to amino acids 1-81 of HSENA78_P2 (SEQID NO:1309).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for HUMODCA_P9 (SEQID NO:1310), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence MKSLTATSSMKVLLPRTFWTRKLMKFLLL (SEQ ID NO: 1768)corresponding to amino acids 1-29 of HUMODCA_P9 (SEQ ID NO:1310), and asecond amino acid sequence being at least 90% homologous toLVLRIATDDSKAVCRLSVKFGATLRTSRLLLERAKELNIDVVGVSFHVGSGCTDPETFVQAISDARCVFDMGAEVGFSMYLLDIGGGFPGSEDVKLKFEEITGVINPALDKYFPSDSGVRIIAEPGRYYVASAFTLAVNIIAKKIVLKEQTGSDDEDESSEQTFMYYVNDGVYGSFNCILYDHAHVKPLLQKRPKPDEKYYSSSIWGPTCDGLDRIVERCDLPEMHVGDWMLFENMGAYTVAAASTFNGFQRPTIYYVMSGPAWQLMQQFQNPDFPPEVEEQDASTLPVSCAWESGMKRHRAACASASINV corresponding to amino acids 151-461 ofDCOR_HUMAN (SEQ ID NO:1426), which also corresponds to amino acids30-340 of HUMODCA_P9 (SEQ ID NO:1310), wherein said first and secondamino acid sequences are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a head of HUMODCA_P9 (SEQID NO:1310), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence MKSLTATSSMKVLLPRTFWTRKLMKFLLL (SEQ ID NO: 1768) of HUMODCA_P9(SEQ ID NO:1310).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for HUMODCA_P9 (SEQID NO:1310), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence MKSLTATSSMKVLLPRTFWTRKLMKFLLL (SEQ ID NO: 1768)corresponding to amino acids 1-29 of HUMODCA_P9 (SEQ ID NO:1310), and asecond amino acid sequence being at least 90% homologous toLVLRIATDDSKAVCRLSVKFGATLRTSRLLLERAKELNIDVVGVSFHVGSGCTDPETFVQAISDARCVFDMGAEVGFSMYLLDIGGGFPGSEDVKLKFEEITGVINPALDKYFPSDSGVRIIAEPGRYYVASAFTLAVNIIAKKIVLKEQTGSDDEDESSEQTFMYYVNDGVYGSFNCILYDHAHVKPLLQKRPKPDEKYYSSSIWGPTCDGLDRIVERCDLPEMHVGDWMLFENMGAYTVAAASTFNGFQRPTIYYVMSGPAWQLMQQFQNPDFPPEVEEQDASTLPVSCAWESGMKRHRAACASASINV corresponding to amino acids 40-350 ofAAA59968, which also corresponds to amino acids 30-340 of HUMODCA_P9(SEQ ID NO:1310), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a head of HUMODCA_P9 (SEQID NO:1310), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence MKSLTATSSMKVLLPRTFWTRKLMKFLLL (SEQ ID NO: 1768) of HUMODCA_P9(SEQ ID NO:1310).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for HUMODCA_P9 (SEQID NO:1310), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence MKSLTATSSMKVLLPRTFWTRKLMKFLLL (SEQ ID NO: 1768)corresponding to amino acids 1-29 of HUMODCA_P9 (SEQ ID NO:1310), and asecond amino acid sequence being at least 90% homologous toLVLRIATDDSKAVCRLSVKFGATLRTSRLLLERAKELNIDVVGVSFHVGSGCTDPETFVQAISDARCVFDMGAEVGFSMYLLDIGGGFPGSEDVKLKFEEITGVINPALDKYFPSDSGVRIIAEPGRYYVASAFTLAVNIIAKKIVLKEQTGSDDEDESSEQTFMYYVNDGVYGSFNCILYDHAHVKPLLQKRPKPDEKYYSSSIWGPTCDGLDRIVERCDLPEMHVGDWMLFENMGAYTVAAASTFNGFQRPTIYYVMSGPAWQLMQQFQNPDFPPEVEEQDASTLPVSCAWESGMKRHRAACASASINV corresponding to amino acids 86-396 ofAAH14562 (SEQ ID NO:1703), which also corresponds to amino acids 30-340of HUMODCA_P9 (SEQ ID NO:1310), wherein said first and second amino acidsequences are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a head of HUMODCA_P9 (SEQID NO:1310), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence MKSLTATSSMKVLLPRTFWTRKLMKFLLL (SEQ ID NO: 1768) of HUMODCA_P9(SEQ ID NO:1310).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for R00299_P3 (SEQ IDNO:1311), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence MAEKALLCPSSAGLGTWPWVLNSAWPVLPLAVDQGVDWRPRGPV (SEQ IDNO: 1769) corresponding to amino acids 1-44 of R00299_P3 (SEQ IDNO:1311), second amino acid sequence being at least 90% homologous toSSDQIEQLHRRFKQLSGDQPTIRKENFNNVPDLELNPIRSKIVRAFFDNRNLRKGPSGLADEINFEDFLTIMSYFRPIDTTMDEEQVELSRKEKLRFLFHMYDSDSDGRITLEEYRNV corresponding to aminoacids 74-191 of Q9NWT9 (SEQ ID NO:1704), which also corresponds to aminoacids 45-162 of R00299_P3 (SEQ ID NO:1311), and a third amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequenceVEELLSGNPHIEKESARSIADGAMMEAASVCMGQMEPDQVYEGITFEDFLKIWQGIDIETKMHVRFLNMETMALCH (SEQ ID NO: 1770) corresponding to amino acids 163-238 ofR00299_P3 (SEQ ID NO:1311), wherein said first, second and third aminoacid sequences are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a head of R00299_P3 (SEQID NO:1311), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence MAEKALLCPSSAGLGTWPWVLNSAWPVLPLAVDQGVDWRPRGPV (SEQ ID NO: 1769)of R00299_P3 (SEQ ID NO:1311).

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail of R00299_P3 (SEQID NO:1311), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequenceVEELLSGNPHIEKESARSIADGAMMEAASVCMGQMEPDQVYEGITFEDFLKIWQGIDIETKMHVRFLNMETMALCH (SEQ ID NO: 1770) in R00299_P3 (SEQ ID NO:1311).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding for R00299_P3 (SEQ IDNO:1311), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence MAEKALLCPSSAGLGTWPWVLNSAWPVLPLAVDQGVDWRPRGPV (SEQ IDNO: 1769) corresponding to amino acids 1-44 of R00299_P3 (SEQ IDNO:1311), and a second amino acid sequence being at least 90% homologoustoSSDQIEQLHRRFKQLSGDQPTIRKENFNNVPDLELNPIRSKIVRAFFDNRNLRKGPSGLADEINFEDFLTIMSYFRPIDTTMDEEQVELSRKEKLRFLFHMYDSDSDGRITLEEYRNVVEELLSGNPHIEKESARSIADGAMMEAASVCMGQMEPDQVYEGITFEDFLKIWQGIDIETKMHVRFLNMETMALCH (SEQ ID NO: 1770)corresponding to amino acids 21-214 of TESC_HUMAN (SEQ ID NO:1427),which also corresponds to amino acids 45-238 of R00299_P3 (SEQ IDNO:1311), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a head of R00299_P3 (SEQID NO:1311), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence MAEKALLCPSSAGLGTWPWVLNSAWPVLPLAVDQGVDWRPRGPV (SEQ ID NO: 1769)of R00299_P3 (SEQ ID NO:1311).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forW60282_PEA_(—)1_P14 (SEQ ID NO:1312), comprising a first amino acidsequence being at least 90% homologous toMRILQLILLALATGLVGGETRIIKGFECKPHSQPWQAALFEKTRLLCGATLIAPRWLLTAAHCLKPcorresponding to amino acids 1-66 of Q8IXD7 (SEQ ID NO:1705), which alsocorresponds to amino acids 1-66 of W60282_PEA_(—)1_P14 (SEQ ID NO:1312),and a second amino acid sequence being at least 70%, optionally at least80%, preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceTPASHLAMRQHHHH (SEQ ID NO: 1771) corresponding to amino acids 67-80 ofW60282_PEA_(—)1_P14 (SEQ ID NO:1312), wherein said first and secondamino acid sequences are contiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofW60282_PEA_(—)1_P14 (SEQ ID NO:1312), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence TPASHLAMRQHHHH (SEQ ID NO: 1771) inW60282_PEA_(—)1_P14 (SEQ ID NO:1312).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forZ41644_PEA_(—)1_P10 (SEQ ID NO:1313), comprising a first amino acidsequence being at least 90% homologous toMRLLAAALLLLLLALYTARVDGSKCKCSRKGPKIRYSDVKKLEMKPKYPHCEEKMVIITTKSVSRYRGQEHCLHPKLQSTKRFIKWYNAWNEKRR corresponding to amino acids 1-95 ofSZ14_HUMAN (SEQ ID NO:1429), which also corresponds to amino acids 1-95of Z41644_PEA_(—)1_P10 (SEQ ID NO:1313), and a second amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequenceYAPPLLTFLPTRPSCGSQDGKGPPHQVI (SEQ ID NO: 1772) corresponding to aminoacids 96-123 of Z41644_PEA_(—)1_P10 (SEQ ID NO:1313), wherein said firstand second amino acid sequences are contiguous and in a sequentialorder.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofZ41644_PEA_(—)1_P10 (SEQ ID NO:1313), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence YAPPLLTFLPTRPSCGSQDGKGPPHQVI (SEQ ID NO:1772) in Z41644_PEA_(—)1_P10 (SEQ ID NO:1313).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forZ41644_PEA_(—)1_P10 (SEQ ID NO:1313), comprising a first amino acidsequence being at least 90% homologous toMRLLAAALLLLLLALYTARVDGSKCKCSRKGPKIRYSDVKKLEMKPKYPHCEEKMVIITTKSVSRYRGQEHCLHPKLQSTKRFIKWYNAWNEKRR corresponding to amino acids 13-107 of Q9NS21(SEQ ID NO:1706), which also corresponds to amino acids 1-95 ofZ41644_PEA_(—)1_P10 (SEQ ID NO:1313), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence YAPPLLTFLPTRPSCGSQDGKGPPHQVI (SEQID NO: 1772) corresponding to amino acids 96-123 of Z41644_PEA_(—)1_P10(SEQ ID NO:1313), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofZ41644_PEA_(—)1_P10 (SEQ ID NO:1313), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence YAPPLLTFLPTRPSCGSQDGKGPPHQVI (SEQ ID NO:1772) in Z41644_PEA1_P10 (SEQ ID NO:1313).

According to preferred embodiments of the present invention, there isprovided an isolated chimeric polypeptide encoding forZ41644_PEA_(—)1_P10 (SEQ ID NO:1313), comprising a first amino acidsequence being at least 90% homologous toMRLLAAALLLLLLALYTARVDGSKCKCSRKGPKIRYSDVKKLEMKPKYPHCEEKMVIITTKSVSRYRGQEHCLHPKLQSTKRFIKWYNAWNEKRR corresponding to amino acids 13-107 ofAAQ89265 (SEQ ID NO:781), which also corresponds to amino acids 1-95 ofZ41644_PEA_(—)1_P10 (SEQ ID NO:1313), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence YAPPLLTFLPTRPSCGSQDGKGPPHQVI (SEQID NO: 1772) corresponding to amino acids 96-123 of Z41644_PEA_(—)1_P10(SEQ ID NO:1313), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

According to preferred embodiments of the present invention, there isprovided an isolated polypeptide encoding for a tail ofZ41644_PEA_(—)1_P10 (SEQ ID NO:1313), comprising a polypeptide being atleast 70%, optionally at least about 80%, preferably at least about 85%,more preferably at least about 90% and most preferably at least about95% homologous to the sequence YAPPLLTFLPTRPSCGSQDGKGPPHQVI (SEQ ID NO:1772) in Z41644_PEA_(—)1_P10 (SEQ ID NO:1313).

According to preferred embodiments of the present invention, there isprovided an antibody capable of specifically binding to an epitope of anamino acid sequences.

Optionally the amino acid sequence corresponds to a bridge, edgeportion, tail, head or insertion.

Optionally the antibody is capable of differentiating between a splicevariant having said epitope and a corresponding known protein.

According to preferred embodiments of the present invention, there isprovided a kit for detecting lung cancer, comprising a kit detectingoverexpression of a splice variant according to any of the above claims.

Optionally the kit comprises a NAT-based technology.

Optionally the kit further comprises at least one primer pair capable ofselectively hybridizing to a nucleic acid sequence according to any ofthe above claims.

Optionally the kit further comprises at least one oligonucleotidecapable of selectively hybridizing to a nucleic acid sequence accordingto any of the above claims.

Optionally the kit comprises an antibody according to any of the aboveclaims.

Optionally the kit further comprises at least one reagent for performingan ELISA or a Western blot.

According to preferred embodiments of the present invention, there isprovided a method for detecting lung cancer, comprising detectingoverexpression of a splice variant according to any of the above claims.

Optionally the detecting overexpression is performed with a NAT-basedtechnology.

Optionally detecting overexpression is performed with an immunoassay.

Optionally the immunoassay comprises an antibody according to any of theabove claims.

According to preferred embodiments of the present invention, there isprovided a biomarker capable of detecting lung cancer, comprising any ofthe above nucleic acid sequences or a fragment thereof, or any of theabove amino acid sequences or a fragment thereof.

According to preferred embodiments of the present invention, there isprovided a method for screening for lung cancer, comprising detectinglung cancer cells with a biomarker or an antibody or a method or assayaccording to any of the above claims.

According to preferred embodiments of the present invention, there isprovided a method for diagnosing lung cancer, comprising detecting lungcancer cells with a biomarker or an antibody or a method or assayaccording to any of the above claims.

According to preferred embodiments of the present invention, there isprovided a method for monitoring disease progression and/or treatmentefficacy and/or relapse of lung cancer, comprising detecting lung cancercells with a biomarker or an antibody or a method or assay according toany of the above claims.

According to preferred embodiments of the present invention, there isprovided a method of selecting a therapy for lung cancer, comprisingdetecting lung cancer cells with a biomarker or an antibody or a methodor assay according to any of the above claims and selecting a therapyaccording to said detection.

According to some embodiments of the present invention, there isprovided an isolated polynucleotide comprising the polynucleotidesequence set forth in a member selected from the group consisting of SEQID NOs: 1-195, 204-250, 306-323, 335-693, 695-1021, 1067-1100,1276-1280, 1464-1465, 1480, 1512-1514, 1517, 1529, 1532, 1558, 1574,1594, 1600, 1616, 1619, 1622, 1625, 1626, 1636, 1639, 1642, 1645, 1648,1651, 1654, 1657, 1660, 1663, 1666, 1669, 1672, 1675, 1678, 1681, 1684,1687, 1690, and 1693, or a sequence at least about 95% identicalthereto.

According to some embodiments of the present invention, there isprovided an isolated polypeptide comprising the polypeptide sequence setforth in a member selected from the group consisting of SEQ ID NOs:251-279, 324-325, 369, 622, 694, 1281-1294, 1299-1415, 1508-1511, 1523,1569-1571, 1581, 1583, 1585, 1613, 1627-1629, 1702, and 1717-1776, or asequence at least about 95% identical thereto.

According to some embodiments of the present invention, there isprovided an expression vector comprising anyone of the foregoingpolynucleotide sequences.

According to some embodiments of the present invention, there isprovided a host cell comprising the foregoing vector.

According to some embodiments of the present invention, there isprovided a process for producing a polypeptide comprising:

culturing the foregoing host cell under conditions suitable to producethe polypeptide encoded by said polynucleotide; and recovering saidpolypeptide.

According to some embodiments of the present invention, there isprovided an isolated primer pair, comprising the pair of nucleic acidsequences selected from the group consisting of SEQ NOs: 1478-1479,1515-1516, 1527-1528, 1530-1531, 1556-1557, 1572-1573, 1592-1593,1598-1599, 1614-1615, 1617-1618, 1620-1621, 1623-1624, 1634-1635,1637-1638, 1640-1641, 1643-1644, 1646-1547, 1649-1650, 1652-1653,1655-1656, 1658-1659, 1661-1662, 1664-1665, 1667-1668, 1670-1671,1673-1674, 1676-1677, 1679-1680, 1682-1683, 1685-1686, 1688-1689,1691-1692.

According to some embodiments of the present invention, there isprovided an antibody to specifically bind to anyone of the foregoingpolypeptides.

According to some embodiments of the present invention, there isprovided a kit for detecting lung cancer, comprising at least one of theforegoing primer pairs.

According to some embodiments of the present invention, there isprovided a kit for detecting lung cancer, comprising the foregoingantibody.

According to further embodiments of the present invention, there isprovided the foregoing kit, wherein said immunoassay is selected fromthe group consisting of an enzyme linked immunosorbent assay (ELISA), animmunoprecipitation assay, an immunofluorescence analysis, an enzymeimmunoassay (EIA), a radioimmunoassay (RIA), or a Western blot analysis.

According to some embodiments of the present invention, there isprovided a method for detecting lung cancer, comprising detectingoverexpression of the polynucleotide sequence set forth in a memberselected from the group consisting of SEQ ID NOs: 1-195, 204-250,306-323, 335-693, 695-1021, 1067-1100, 1276-1280, 1464-1465, 1480,1512-1514, 1517, 1529, 1532, 1558, 1574, 1594, 1600, 1616, 1619, 1622,1625, 1626, 1636, 1639, 1642, 1645, 1648, 1651, 1654, 1657, 1660, 1663,1666, 1669, 1672, 1675, 1678, 1681, 1684, 1687, 1690, and 1693, or asequence at least about 95% identical thereto in a sample from apatient.

According to further embodiments of the present invention, there isprovided the foregoing method for detecting lung cancer, wherein saiddetecting overexpression comprises performing nucleic acidamplification.

According to some embodiments of the present invention, there isprovided a method for detecting lung cancer, comprising detectingoverexpression of the polypeptide comprising the polypeptide sequenceset forth in a member selected from the group consisting of SEQ ID NOs:251-279, 324-325, 369, 622, 694, 1281-1294, 1299-1415, 1508-1511, 1523,1569-1571, 1581, 1583, 1585, 1613, 1627-1629, 1702, and 1717-1776 in asample from a patient.

According to further embodiments of the present invention, there isprovided the foregoing method for detecting lung cancer, wherein saiddetecting comprises detecting binding of the foregoing antibody to thepolypeptide comprising the polypeptide sequence set forth in a memberselected from the group consisting of SEQ ID NOs: 251-279, 324-325, 369,622, 694, 1281-1294, 1299-1415, 1508-1511, 1523, 1569-1571, 1581, 1583,1585, 1613, 1627-1629, 1702, and 1717-1776 in a sample from a patient.

According to some embodiments of the present invention, there isprovided a biomarker for detecting lung cancer, comprising an amino acidsequence comprising the polypeptide sequence set forth in a memberselected from the group consisting of SEQ ID NOs: 251-279, 324-325, 369,622, 694, 1281-1294, 1299-1415, 1508-1511, 1523, 1569-1571, 1581, 1583,1585, 1613, 1627-1629, 1702, and 1717-1776, or a sequence at least about95% identical thereto, marked with a label.

According to some embodiments of the present invention, there isprovided a method to screen for or to diagnose lung cancer, comprisingdetecting the disease with the biomarker comprising an amino acidsequence comprising the polypeptide sequence set forth in a memberselected from the group consisting of SEQ ID NOs: 251-279, 324-325, 369,622, 694, 1281-1294, 1299-1415, 1508-1511, 1523, 1569-1571, 1581, 1583,1585, 1613, 1627-1629, 1702, and 1717-1776, or a sequence at least about95% identical thereto.

According to some embodiments of the present invention, there isprovided a method for monitoring disease progression, treatment efficacyor relapse of lung cancer, comprising detecting the disease with thebiomarker comprising an amino acid sequence comprising the polypeptidesequence set forth in a member selected from the group consisting of SEQID NOs: 251-279, 324-325, 369, 622, 694, 1281-1294, 1299-1415,1508-1511, 1523, 1569-1571, 1581, 1583, 1585, 1613, 1627-1629, 1702, and1717-1776, or a sequence at least about 95% identical thereto.

According to some embodiments of the present invention, there isprovided a method of selecting a therapy for lung cancer, comprisingdetecting the disease with the biomarker comprising an amino acidsequence comprising the polypeptide sequence set forth in a memberselected from the group consisting of SEQ ID NOs: 251-279, 324-325, 369,622, 694, 1281-1294, 1299-1415, 1508-1511, 1523, 1569-1571, 1581, 1583,1585, 1613, 1627-1629, 1702, and 1717-1776, or a sequence at least about95% identical thereto, and selecting a therapy according to saiddetection.

According to some embodiments of the present invention, there isprovided a biomarker for detecting lung cancer, comprising a nucleotideacid sequence set forth in a member selected from the group consistingof SEQ ID NOs: 1-195, 204-250, 306-323, 335-693, 695-1021, 1067-1100,1276-1280, 1464-1465, 1480, 1512-1514, 1517, 1529, 1532, 1558, 1574,1594, 1600, 1616, 1619, 1622, 1625, 1626, 1636, 1639, 1642, 1645, 1648,1651, 1654, 1657, 1660, 1663, 1666, 1669, 1672, 1675, 1678, 1681, 1684,1687, 1690, and 1693, or a sequence at least about 95% identicalthereto.

According to some embodiments of the present invention, there isprovided a method to screen for or to diagnose lung cancer, comprisingdetecting the disease with the biomarker comprising a nucleotide acidsequence set forth in a member selected from the group consisting of SEQID NOs: 1-195, 204-250, 306-323, 335-693, 695-1021, 1067-1100,1276-1280, 1464-1465, 1480, 1512-1514, 1517, 1529, 1532, 1558, 1574,1594, 1600, 1616, 1619, 1622, 1625, 1626, 1636, 1639, 1642, 1645, 1648,1651, 1654, 1657, 1660, 1663, 1666, 1669, 1672, 1675, 1678, 1681, 1684,1687, 1690, and 1693, or a sequence at least about 95% identicalthereto.

According to some embodiments of the present invention, there isprovided a method for monitoring disease progression, treatment efficacyor relapse of lung cancer, comprising detecting the disease with thebiomarker comprising a nucleotide acid sequence set forth in a memberselected from the group consisting of SEQ ID NOs: 1-195, 204-250,306-323, 335-693, 695-1021, 1067-1100, 1276-1280, 1464-1465, 1480,1512-1514, 1517, 1529, 1532, 1558, 1574, 1594, 1600, 1616, 1619, 1622,1625, 1626, 1636, 1639, 1642, 1645, 1648, 1651, 1654, 1657, 1660, 1663,1666, 1669, 1672, 1675, 1678, 1681, 1684, 1687, 1690, and 1693, or asequence at least about 95% identical thereto.

According to some embodiments of the present invention, there isprovided a method of selecting a therapy for lung cancer, comprisingdetecting the disease with the biomarker comprising a nucleotide acidsequence set forth in a member selected from the group consisting of SEQID NOs: 1-195, 204-250, 306-323, 335-693, 695-1021, 1067-1100,1276-1280, 1464-1465, 1480, 1512-1514, 1517, 1529, 1532, 1558, 1574,1594, 1600, 1616, 1619, 1622, 1625, 1626, 1636, 1639, 1642, 1645, 1648,1651, 1654, 1657, 1660, 1663, 1666, 1669, 1672, 1675, 1678, 1681, 1684,1687, 1690, and 1693, or a sequence at least about 95% identical theretoand selecting a therapy according to said detection.

Unless defined otherwise, all technical and scientific terms used hereinhave the meaning commonly understood by a person skilled in the art towhich this invention belongs. The following references provide one ofskill with a general definition of many of the terms used in thisinvention: Singleton et al., Dictionary of Microbiology and MolecularBiology (2nd ed. 1994); The Cambridge Dictionary of Science andTechnology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R.Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, TheHarper Collins Dictionary of Biology (1991). All of these are herebyincorporated by reference as if fully set forth herein. As used herein,the following terms have the meanings ascribed to them unless specifiedotherwise.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is schematic summary of cancer biomarkers selection engine andthe wet validation stages.

FIG. 2. Schematic illustration, depicting grouping of transcripts of agiven contig based on presence or absence of unique sequence regions.

FIG. 3 is schematic summary of quantitative real-time PCR analysis.

FIG. 4 is schematic presentation of the oligonucleotide based microarrayfabrication.

FIG. 5 is schematic summary of the oligonucleotide based microarrayexperimental flow.

FIG. 6 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster H61775, demonstrating overexpression in brainmalignant tumors and a mixture of malignant tumors from differenttissues.

FIG. 7 is a histogram showing expression of transcripts of variants ofthe immunoglobulin superfamily, member 9, H61775 transcripts, which aredetectable by amplicon as depicted in sequence name H61775seg8 (SEQ IDNO: 1636), in normal and cancerous lung tissues.

FIG. 8 is a histogram showing expression of immunoglobulin superfamily,member 9, H61775 transcripts, which are detectable by amplicon asdepicted in sequence name H61775seg8 (SEQ ID NO: 1636), in differentnormal tissues.

FIG. 9 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster M85491, demonstrating overexpression inepithelial malignant tumors and a mixture of malignant tumors fromdifferent tissues.

FIG. 10 is a histogram showing over expression of the above-indicatedEphrin type-B receptor 2 precursor M85491 transcripts, which aredetectable by amplicon as depicted in sequence name M85491seg24 (SEQ IDNO: 1639), in cancerous lung samples relative to the normal samples.

FIG. 11 is a histogram showing the expression of Ephrin type-B receptor2 precursor (Tyrosine-protein kinase receptor EPH-3) M85491 transcriptswhich are detectable by amplicon as depicted in sequence nameM85491seg24 (SEQ ID NO: 1639) in different normal tissues.

FIG. 12 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster T39971, demonstrating overexpression in livercancer, lung malignant tumors and pancreas carcinoma.

FIG. 13 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster Z21368, demonstrating overexpression inepithelial malignant tumors, a mixture of malignant tumors fromdifferent tissues and pancreas carcinoma.

FIG. 14 is a histogram showing over expression of the Extracellularsulfatase Sulf-1 Z21368 transcripts, which are detectable by amplicon asdepicted in sequence name Z21368junc17-21 (SEQ ID NO: 1642), incancerous lung samples relative to the normal samples.

FIG. 15 is a histogram showing the expression of Extracellular sulfataseSulf-1 Z21368 transcripts, which are detectable by amplicon as depictedin sequence name Z21368 junc17-21 (SEQ ID NO: 1642), in different normaltissues.

FIG. 16 is a histogram showing over expression of theSUL1_HUMAN—Extracellular sulfatase Sulf-1, Z21368 transcripts, which aredetectable by amplicon as depicted in sequence name Z21368seg39 (SEQ IDNO: 1645), in cancerous lung samples relative to the normal samples.

FIG. 17 is a histogram showing expression of SUL1_HUMAN—Extracellularsulfatase Sulf-1, Z21368 transcripts, which are detectable by ampliconas depicted in sequence name Z21368seg39 (SEQ ID NO: 1645), in differentnormal tissues.

FIG. 18 is a histogram showing the expression of SMO2_HUMAN SPARCrelated modular calcium-binding protein 2 precursor (Secreted modularcalcium-binding protein 2) (SMOC-2) (Smooth muscle-associated protein 2)Z44808 transcripts which are detectable by amplicon as depicted insequence name Z44808 junc8-11 (SEQ ID NO: 1651) in different normaltissues.

FIG. 19 is a histogram showing over expression of the gastrin-releasingpeptide (HUMGRP5E) transcripts, which are detectable by amplicon asdepicted in sequence name HUMGRP5Ejunc3-7 (SEQ ID NO: 1648), in severalcancerous lung samples relative to the normal samples.

FIG. 20 is a histogram showing the expression of gastrin-releasingpeptide (HUMGRP5E) transcripts, which are detectable by amplicon asdepicted in sequence name HUMGRP5Ejunc3-7 (SEQ ID NO: 1648), indifferent normal tissues.

FIG. 21 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster F05068, demonstrating overexpression in uterinemalignancies.

FIG. 22 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster H14624, demonstrating overexpression incolorectal cancer, epithelial malignant tumors, a mixture of malignanttumors from different tissues, lung malignant tumors and pancreascarcinoma.

FIG. 23 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster H38804, demonstrating overexpression intransitional cell carcinoma, brain malignant tumors, a mixture ofmalignant tumors from different tissues and gastric carcinoma.

FIG. 24 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster HSENA78, demonstrating overexpression inepithelial malignant tumors and lung malignant tumors.

FIG. 25 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster HUMODCA, demonstrating overexpression in: brainmalignant tumors, colorectal cancer, epithelial malignant tumors and amixture of malignant tumors from different tissues.

FIG. 26 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster R00299, demonstrating overexpression in lungmalignant tumors.

FIG. 27 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster Z41644, demonstrating overexpression in lungmalignant tumors, breast malignant tumors and pancreas carcinoma.

FIG. 28 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster Z44808, demonstrating overexpression incolorectal cancer, lung cancer and pancreas carcinoma.

FIG. 29 is a histogram showing over expression of the SMO2_HUMAN SPARCrelated modular calcium-binding protein 2 Z44808 transcripts, which aredetectable by amplicon as depicted in sequence name Z44808junc8-11 (SEQID NO: 1651), in cancerous lung samples relative to the normal samples.

FIG. 30 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster AA161187, demonstrating overexpression in brainmalignant tumors, epithelial malignant tumors and a mixture of malignanttumors from different tissues.

FIG. 31 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster AA161187, demonstrating overexpression in brainmalignant tumors and a mixture of malignant tumors from differenttissues.

FIG. 32 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster HUMCA1XIA, demonstrating overexpression in bonemalignant tumors, epithelial malignant tumors, a mixture of malignanttumors from different tissues and lung malignant tumors.

FIG. 33 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster HUMCEA, demonstrating overexpression inepithelial malignant tumors, a mixture of malignant tumors fromdifferent tissues and pancreas carcinoma.

FIG. 34 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster R35137, demonstrating overexpression inhepatocellular carcinoma.

FIG. 35 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster Z25299, demonstrating overexpression in brainmalignant tumors, a mixture of malignant tumors from different tissuesand ovarian carcinoma.

FIG. 36 is a histogram showing down regulation of the Secretoryleukocyte protease inhibitor Acid-stable proteinase inhibitor Z25299transcripts, which are detectable by amplicon as depicted in sequencename Z25299 junc13-14-21 (SEQ ID NO: 1666), in cancerous lung samplesrelative to the normal samples.

FIG. 37 is a histogram showing down regulation of the Secretoryleukocyte protease inhibitor Acid-stable proteinase inhibitor Z25299transcripts, which are detectable by amplicon as depicted in sequencename Z25299 seg20 (SEQ ID NO: 1669), in cancerous lung samples relativeto the normal samples.

FIG. 38 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster HSSTROL3, demonstrating overexpression intransitional cell carcinoma, epithelial malignant tumors, a mixture ofmalignant tumors from different tissues and pancreas carcinoma.

FIG. 39 is a histogram showing over expression of the Stromelysin-3HSSTROL3 transcripts, which are detectable by amplicon as depicted insequence name HSSTROL3 seg24 (SEQ ID NO: 1675), in cancerous lungsamples relative to the normal samples.

FIG. 40 is a histogram showing the expression of Stromelysin-3 HSSTROL3transcripts, which are detectable by amplicon as depicted in sequencename HSSTROL3 seg24 (SEQ ID NO: 1675), in different normal tissues.

FIG. 41 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster HUMTREFAC, demonstrating overexpression in amixture of malignant tumors from different tissues, breast malignanttumors, pancreas carcinoma and prostate cancer.

FIG. 42 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster HSS100PCB, demonstrating overexpression in amixture of malignant tumors from different tissues.

FIG. 43 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster HSU33147, demonstrating overexpression in amixture of malignant tumors from different tissues.

FIG. 44 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster R20779, demonstrating overexpression inepithelial malignant tumors, a mixture of malignant tumors fromdifferent tissues and lung malignant tumors.

FIG. 45 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster R38144, demonstrating overexpression inepithelial malignant tumors, lung malignant tumors, skin malignanciesand gastric carcinoma.

FIG. 46 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster HUMOSTRO, demonstrating overexpression inepithelial malignant tumors, a mixture of malignant tumors fromdifferent tissues, lung malignant tumors, breast malignant tumors,ovarian carcinoma and skin malignancies.

FIG. 47 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster HUMOSTRO, demonstrating overexpression inepithelial malignant tumors, a mixture of malignant tumors fromdifferent tissues and kidney malignant tumors.

FIG. 48 is a histogram showing over expression of the R11723transcripts, which are detectable by amplicon as depicted in sequencename R11723 seg13 (SEQ ID NO: 1684), in cancerous lung samples relativeto the normal samples.

FIG. 49 is a histogram showing the expression of R11723 transcriptswhich are detectable by amplicon as depicted in sequence nameR11723seg13 (SEQ ID NO: 1684) in different normal tissues.

FIG. 50 is a histogram showing over expression of the R11723transcripts, which are detectable by amplicon as depicted in sequencename R11723 junc11-18 (SEQ ID NO: 1687) in cancerous lung samplesrelative to the normal samples.

FIG. 51 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster R16276, demonstrating overexpression in: lungmalignant tumors.

FIGS. 52-53 are histograms, showing differential expression of the 6sequences H61775seg8 (SEQ ID NO: 1636), HUMGRP5E junc3-7 (SEQ ID NO:1648), M85491Seg24 (SEQ ID NO: 1639), Z21368 junc17-21 (SEQ ID NO:1642), HSSTROL3seg24 (SEQ ID NO: 1675) and Z25299seg20 (SEQ ID NO: 1669)in in cancerous lung samples relative to the normal samples.

FIG. 54 a is a histogram showing the relative expression of trophininassociated protein (tastin)) [T86235] variants (e.g., variant no. 23-26,31, 32) in normal and tumor derived lung samples as determined by realtime PCR using primers for SEQ ID NO: 1480.

FIG. 54 b is a histogram showing the relative expression of trophininassociated protein (tastin)) [T86235] variants (e.g., variant no. 8-10,22, 23, 26,27, 29-31, 33) in normal and tumor derived lung samples asdetermined micro-array analysis using oligos detailed in SEQ ID NO:1512-1514.

FIG. 55 is a histogram showing the relative expression of Homeo box C10(HOXC10) [N31842] variants (e.g., variant no. 3) in normal and tumorderived lung samples as determined by real time PCR using primers forSEQ ID NO: 1517.

FIGS. 56 a-b are histograms showing on two different scales the relativeexpression of Nucleolar protein 4 (NOL4) [T06014] variants (e.g.,variant no. 3, 11 and 12) in normal and tumor derived lung samples asdetermined by real time PCR using primers for SEQ ID NO: 1529. FIG. 56 ashows the results on scale:0-1200. FIG. 56 b shows the results onscale:0-24.

FIGS. 57 a-b is a histogram showing on two different scales the relativeexpression of Nucleolar protein 4 (NOL4) [T06014] variants (e.g.,variant no. 3, 11 and 12) in normal and tumor derived lung samples asdetermined by real time PCR using primers for SEQ ID NO: 1532.

FIG. 57 a shows the results on scale:0-2000. FIG. 57 b shows the resultson scale:0-42.

FIG. 58 is a histogram showing the relative expression of AA281370variants (e.g., variant no. 0, 1, 4 and 5) in normal and tumor derivedlung samples as determined by real time PCR using primers for SEQ ID NO:1558.

FIG. 59 is a histogram showing the relative expression of Sulfatase 1(SULF1)-[Z21368] variants (e.g., variant no. 13 and 14) in normal andtumor derived lung samples as determined by real time PCR using primersfor SEQ ID NO: 1574.

FIG. 60 is a histogram showing the relative expression of SRY (sexdetermining region Y)-box 2 (SOX2))-[HUMHMGBOX] variants (e.g., variantno. 0) in normal and tumor derived lung samples as determined by realtime PCR using primers for SEQ ID NO: 1594.

FIG. 61 is a histogram showing the relative expression of Plakophilin 1(ectodermal dysplasia/skin fragility syndrome) (PKP1)-[HSB6PR] variants(e.g., variant no. 0, 5 and 6) in normal and tumor derived lung samplesas determined by real time PCR using primers for SEQ ID NO: 1600.

FIG. 62 is a histogram showing the relative expression of transcriptsdetectable by SEQ ID NOs: 1480, 1517, 1529, 1532, 1558, 1574, 1594,1600, 1616, 1619, 1622, 1625 in normal and tumor derived lung samples asdetermined by real time PCR.

FIG. 63 is an amino acid sequence alignment, using NCBI BLAST defaultparameters, demonstrating similarity between the AA281370 lung cancerbiomarker if the present invention to WD40 domains of various proteinsinvolved in MAPK signal trunsduction pathway. FIG. 63 a: amino acids atpositions 40-790 of AA281370 polypeptide SEQ ID NO: 99 has 75% homologyto mouse Mapkbp1 protein (gi|47124622). FIG. 63 b: amino acids atpositions 40-886 of the AA281370 polypeptide SEQ ID NO: 99 has 70%homology to rat JNK-binding protein JNKBP1 (gi|34856717).

FIG. 64 is a histogram showing over expression of the Homo sapiensprotease, serine, 21 (testisin) (PRSS21) AA161187 transcripts, which aredetectable by amplicon as depicted in sequence name AA161187 seg25 (SEQID NO:1654), in cancerous lung samples relative to the normal samples.

FIG. 65 is a histogram showing over expression of the protein tyrosinephosphatase, receptor type, S (PTPRS) M62069 transcripts, which aredetectable by amplicon as depicted in sequence name M62069 seg19 (SEQ IDNO: 1657), in cancerous lung samples relative to the normal samples.

FIG. 66 is a histogram showing over expression of the protein tyrosinephosphatase, receptor type, S (PTPRS) M62069 transcripts, which aredetectable by amplicon as depicted in sequence name M62069 seg29 (SEQ IDNO: 1660), in cancerous lung samples relative to the normal samples.

FIG. 67 is a histogram showing over expression of the above-indicatedHomo sapiens collagen, type XI, alpha 1 (COL11A1) transcripts which aredetectable by amplicon as depicted in sequence name HUMCA1X1A seg55 (SEQID NO:1663) in cancerous lung samples relative to the normal samples.

FIG. 68 is a histogram showing down regulation of the Homo sapienssecretory leukocyte protease inhibitor (antileukoproteinase) (SLPI)Z25299 transcripts which are detectable by amplicon as depicted insequence name Z25299 seg23 (SEQ ID NO: 1672) in cancerous lung samplesrelative to the normal samples.

FIG. 69 is a histogram showing the expression of Secretory leukocyteprotease inhibitor Acid-stable proteinase inhibitor Z25299 transcriptswhich are detectable by amplicon as depicted in sequence nameZ25299seg20 (SEQ ID NO: 1669) in different normal tissues.

FIG. 70 is a histogram showing the expression of Secretory leukocyteprotease inhibitor Acid-stable proteinase inhibitor Z25299 transcriptswhich are detectable by amplicon as depicted in sequence nameZ25299seg23 (SEQ ID NO: 1672) in different normal tissues.

FIG. 71 is a histogram showing over expression of the Homo sapiensmatrix metalloproteinase 11 (stromelysin 3) (MMP11) HSSTROL3 transcriptswhich are detectable by amplicon as depicted in sequence name HSSTROL3seg20-2 (SEQ ID NO: 1678) in cancerous lung samples relative to thenormal samples.

FIG. 72 is a histogram showing over expression of the Homo sapiensmatrix metalloproteinase 11 (stromelysin 3) (MMP11) HSSTROL3 transcriptswhich are detectable by amplicon as depicted in sequence name HSSTROL3junc21-27 (SEQ ID NO: 1681) in cancerous lung samples relative to thenormal samples.

FIG. 73 is a histogram showing the expression of R11723 transcripts,which were detected by amplicon as depicted in the sequence name R11723junc11-18 (SEQ ID NO: 1687) in different normal tissues.

FIG. 74 is a histogram showing over expression of the Homo sapiensfibroblast growth factor receptor-like 1 (FGFRL1) H53626 transcripts,which are detectable by amplicon as depicted in sequence name H53626junc24-27F1R3 (SEQ ID NO: 1690) in cancerous lung samples relative tothe normal samples.

FIG. 75 is a histogram showing the expression of the Homo sapiensfibroblast growth factor receptor-like 1 (FGFRL1) H53626 transcripts,which are detectable by amplicon as depicted in sequence name H53626seg25 (SEQ ID NO: 1693) in cancerous lung samples relative to the normalsamples.

FIG. 76 is a histogram showing Cancer and cell-line vs. normal tissueexpression for Cluster H53626, demonstrating overexpression inepithelial malignant tumors, a mixture of malignant tumors fromdifferent tissues and myosarcoma.

FIG. 77 is a histogram showing the expression of of Homo sapiensfibroblast growth factor receptor-like 1 (FGFRL1) H53626 transcripts,which are detectable by amplicon as depicted in sequence name H53626seg25 (SEQ ID NO: 1693) in different normal tissues.

FIG. 78 is a histogram showing the expression of of Homo sapiensfibroblast growth factor receptor-like 1 (FGFRL1) H53626 transcripts,which are detectable by amplicon as depicted in sequence name H53626junc24-27F1R3 (SEQ ID NO: 1690) in different normal tissues.

FIG. 79 shows PSEC R11723_PEA_(—)1 T5 (SEQ ID NO:148) PCR product; Lane1: PCR product; and Lane 2: Low DNA Mass Ladder MW marker (InvitrogenCat#10068-013).

FIG. 80: PSEC R11723_PEA_(—)1 T5 PCR product sequence; In Red-PSECForward primer; In Blue-PSEC Reverse complementary sequence; andHighlighted sequence-PSEC variant R11723_PEA_(—)1 T5 (SEQ ID NO:148)ORF.

FIG. 81—PRSEC PCR product digested with NheI and HindIII; Lane 1—PRSETPCR product; Lane 2—Fermentas GeneRuler 1 Kb DNA Ladder #SM0313.

FIG. 82 shows a plasmid map of His PSEC T5 pRSETA.

FIG. 83: Protein sequence of PSEC variant R11723_PEA_(—)1 T5 (SEQ IDNO:148); In red-6His tag; In blue-PSEC.

FIG. 84 shows the DNA sequence of His PSEC T5 pRSETA; bold-HisPSEC T5open reading frame; Italic-flanking DNA sequence which was verified bysequence analysis.

FIG. 85 shows Western blot analysis of recombinant HisPSEC variantR11723_PEA_(—)1 T5; lane 1: molecular weight marker (ProSieve color,Cambrex, Cat #50550); lane 2: HisPSEC T5 pRSETA T0; lane 3: His HisPSECT5 pRSETA T3; lane 4: His HisPSEC T5 pRSETA To.n; lane 5: pRSET emptyvector T0 (negative control); lane 6: pRSET empty vector T3 (negativecontrol); lane 7: pRSET empty vector To.n (negative control); and lane8: His positive control protein (HisTroponinT7 pRSETA T3).

FIG. 86 shows the DNA sequences of WT MMP11 (MMP11_(—)488, (SEQ ID NO:1782)) and HSSTROL3_P9 (MMP11_(—)354, (SEQ ID NO: 1783)) used formammalian expression. NcoI and Not I sites used to subclone MMP11fragments into bacterial vectors, without the signal peptide areunderlined. Translation initiation site and stop codons are shown inbold.

FIG. 87 shows Protein sequences used for mammalian expression of WTMMP11 (MMP11_(—)488 (SEQ ID NO: 1784)) and HSSTROL3_P9 (MMP11_(—)354(SEQ ID NO: 1785)). His-tag of 8 His residues is shown in bold.

FIG. 88 shows WT MMP11 (MMP11_(—)488) and HSSTROL3_P9 (MMP11_(—)354) inpIRESpuro3 plasmid maps. NcoI and NotI sites that were used to subcloneMMP11 variants into bacterial expression vectors are marked by arrows.

FIG. 89 shows WT MMP11 (MMP11_(—)488) and HSSTROL3_P9 (MMP11_(—)354) inpET28 plasmid maps. NcoI and NotI sites that were used to subclone MMP11(WT and variant) into bacterial expression vectors are marked witharrows.

FIG. 90 shows protein sequences used for bacterial expression of WTMMP11 (MMP11_(—)488) and HSSTROL3_P9 (MMP11_(—)354). His-tag of 8 Hisresidues is shown in bold.

FIG. 91 shows a Coomassie staining of whole cell lysates MMP11_(—)488and MMP11_(—)354 in pET28. Lanes 1 to 4 and Lane 11 are unrelated tothis experiment; Lane 5 is MMP11_(—)488 pET28, before induction; Lane 6is MMP11_(—)488 pET28, 3 hrs after induction; Lane 7 is MMP11_(—)354pET28, before induction; Lane 8 is MMP11_(—)354 pET28, 3 hrs afterinduction; Lane 9 is Empty pET 28, before induction; Lane 10 is EmptypET 28, 3 hrs after induction; Lane 12 is Rainbow Full Range MolecularWeight Markers GE Healthcare, RPN800

FIG. 92 shows a western blot analysis of whole cell lysates ofMMP11_(—)448 and MMP11_(—)354 in pET28 with anti-His antibody (SerotecCat. #MCA1396). Lane 5 is MMP11_(—)488 pET28, before induction; Lane 6is MMP11_(—)488 pET28, 3 hrs after induction; Lane 7 is MMP11_(—)354pET28, before induction; Lane 8 is MMP11_(—)354 pET28, 3 hrs afterinduction; Lane 9 is Empty pET 28, before induction; Lane 10 is EmptypET 28, 3 hrs after induction; Lane 11 is Mark Western Protein Standard:Invitrogen LC5600.

FIG. 93 shows an overlay of the immunogen Peptide CGEN6301 (SEQ ID NO:1781) on the primary sequence of the HSSTROL3_P9 protein (SEQ ID NO:1398). The Peptide CGEN6301 (SEQ ID NO: 1781) sequence is shown in bold.

FIG. 94 shows CGEN6301 Affinity Purified Antibodies—ELISA results ofLot18976C (Rabbit 8350), and Lot18977C (Rabbit 8351).

FIG. 95 shows Western Blot Data of Affinity Purified Antibody; Lot18976C, Rabbit 8350. HSSTROL3_P9 splice variant protein (SVr) and WTMMP11 protein (WT) were probed (in duplicates) with pre-purified serumof RB 8350 (upper left), flow through from affinity purification, (upperright) and affinity purified antibody (lower) Lot 18976C.

FIG. 96 shows Western Blot Data of Affinity Purified Antibody; Lot18977C Rabbit 8351. HSSTROL3_P9 splice variant protein (SVr) and WTMMP11 protein (WT) were probed with pre-purified serum, RB 8351 (upperleft), flow through from affinity purification (upper right) andaffinity purified antibody Lot 18977C (lower).

FIG. 97 shows CGEN6301 Monoclonal Purified Antibodies—ELISA results ofClone 13E1.G1.F3. (lot18944C) and Clone 7G11.F6.E1. (lot19032C).

FIG. 98 shows the alignment of HUMGRP5E_P5 ((SEQ ID NO: 1300), indicatedin the Figure as CgenGRP)) and Wild Type GRP isoforms WT GPR 1 (SEQ IDNOs:1421), WT GPR 2 (SEQ ID NOs: 1788), WT GPR 3 (SEQ ID NOs:1789)protein sequences.

FIG. 99 a shows the GRP 148 DNA optimized ORF sequence (SEQ ID NO:1790). EcoRI and NotI restriction sites are underlined. Open readingframe is shown in bold.

FIG. 99 b shows the GRP 142 DNA optimized ORF sequence (SEQ ID NO:1791). EcoRI and NotI restriction sites are underlined. Open readingframe is shown in bold.

FIG. 100 a shows the protein sequence of recombinant GRP-148 (SEQ ID NO:1792). IL6 signal peptide is shown in bold. The 8xHis tag is unerlined.

FIG. 100 b shows the protein sequence of recombinant GRP-142 (SEQ ID NO:1793). IL6 signal peptide is shown in bold. The 8xHis tag is underlined.

FIG. 101 a shows the schematic presentation of GRP-148 in pIRESpuro.

FIG. 101 b shows the schematic presentation of GRP-142 in pIRESpuro.

FIG. 102 shows the results of western blot analysis of mammalianexpression of GRP proteins using anti His antibodies. Lane 1 shows theMW markers; Lanes 2-6 represent irrelevant proteins; Lane 7 representsGRP 148 (SEQ ID NO: 1792); Lane 8 represents GRP 142 (SEQ ID NO: 1793).

FIG. 103 shows the results of SDS-PAGE, Coomassie staining,demonstrating the analysis of purified GRP-148, shown in lane 8. Lane 1represents a MW marker; Lanes 2-5 represents BSA 2 mg/ml, 1 mg/ml, 0.5mg/ml, 0.25 mg/ml, respectively; Lane 6 represents BSA 1 mg/ml no DTT;Lanes 7 and 9 are empty; Lane10 shows irrelevant protein.

FIG. 104 shows SDS-PAGE, Coomassie stained gel analysis of GRP-142 (SEQID NO:1793), shown in lane 6. Lanes 1-4 represents BSA 1 mg/ml, 0.5mg/ml, 0.25 mg/ml, 0.1 mg/ml, respectively; Lane 5 corresponds to MWMarker (Cambrex prosieve).

FIG. 105 shows an overlay of HUMGRP5E_P5 immunogen (SEQ ID NO:1795) onHUMGRP5E_P5 ((SEQ ID NO: 1300) protein sequence. The immunogen sequenceis shown in bold.

FIG. 106 shows ELISA results of CGEN0601 Affinity Purified Antibodies,Lot18878C, Rabbit 8349.

FIG. 107 shows ELISA results of CGEN0601 Affinity Purified Antibodies,Lot 18980C, Rabbit 8348.

FIG. 108 shows Western Blot Data of Affinity Purified Antibody, Lot18878C (Rabbit 8349). HUMGRP5E_P5 (SEQ ID NO: 1300) splice variant (SVr)and WT GRP precursor (SEQ ID NO:1421) (WT) were probed with pre-purifiedserum of the Rb 8349 (lanes 1 and 2), affinity purified antibody lot18878C, Rb 8349 (lanes 5 and 6) and flow through from affinitypurification, Rb 8349 (lanes 3 and 4).

FIG. 109 shows Western Blot Data of Affinity Purified Antibody, Lot18980C (Rabbit 8348).

FIG. 110 shows ELISA Data of Rabbit 8349 Cross-adsorbed product (Lot18978C).

FIG. 111 shows concentration of HUMGRP5E_P5 (SEQ ID NO: 1300) in controland SCLC patients' sera.

FIG. 112 is a histogram showing the expression of NTS D56406 transcriptswhich are detectable by amplicon as depicted in sequence nameD56406_seg7-9F2R2 in normal and cancerous Lung tissues.

FIG. 113 is a histogram showing the expression of NTS D56406 transcriptswhich are detectable by amplicon as depicted in sequence nameD56406_seg7-9F2R2 in different normal tissues.

FIG. 114 is a histogram showing the expression of SULF1 Z21368transcripts which are detectable by amplicon as depicted in sequencename Z21368_junc59-64F1R1 (SEQ ID NO: 1801) in normal and cancerous Lungtissues. FIG. 115 is a histogram showing the expression of SULF1 Z21368transcripts which are detectable by amplicon as depicted in sequencename Z21368_junc59-64F1R1 (SEQ ID NO: 1801) in different normal tissues.

DESCRIPTION OF PREFERRED EMBODIMENTS

The present invention is of novel markers for lung cancer that are bothsensitive and accurate. Furthermore, at least certain of these markersare able to distinguish between various types of lung cancer, such assmall cell carcinoma; large cell carcinoma; squamous cell carcinoma; andadenocarcinoma, alone or in combination. These markers aredifferentially expressed, and preferably overexpressed, in lung cancerspecifically, as opposed to normal lung tissue. The measurement of thesemarkers, alone or in combination, in patient samples providesinformation that the diagnostician can correlate with a probablediagnosis of lung cancer. The markers of the present invention, alone orin combination, show a high degree of differential detection betweenlung cancer and non-cancerous states. The markers of the presentinvention, alone or in combination, can be used for prognosis,prediction, screening, early diagnosis, therapy selection and treatmentmonitoring of lung cancer. For example, optionally and preferably, thesemarkers may be used for staging lung cancer and/or monitoring theprogression of the disease. Furthermore, the markers of the presentinvention, alone or in combination, can be used for detection of thesource of metastasis found in anatomical places other than lung. Also,one or more of the markers may optionally be used in combination withone or more other lung cancer markers (other than those describedherein). According to an optional embodiment of the present invention,such a combination may be used to differentiate between various types oflung cancer, such as small cell carcinoma; large cell carcinoma;squamous cell carcinoma; and adenocarcinoma. Furthermore, the markers ofthe present invention, alone or in combination, can be used fordetection of other types of tumors by elimination (for example, for suchdetection of carcinoid tumors, which are 5% of lung cancers).

The markers of the present invention, alone or in combination, can beused for prognosis, prediction, screening, early diagnosis, staging,therapy selection and treatment monitoring of lung cancer. For example,optionally and preferably, these markers may be used for staging lungcancer and/or monitoring the progression of the disease. Furthermore,the markers of the present invention, alone or in combination, can beused for detection of the source of metastasis found in anatomicalplaces other then lung. Also, one or more of the markers may optionallybe used in combination with one or more other lung cancer markers (otherthan those described herein).

Biomolecular sequences (amino acid and/or nucleic acid sequences)uncovered using the methodology of the present invention and describedherein can be efficiently utilized as tissue or pathological markersand/or as drugs or drug targets for treating or preventing a disease.

These markers are specifically released to the bloodstream underconditions of lung cancer, and/or are otherwise expressed at a muchhigher level and/or specifically expressed in lung cancer tissue orcells. The measurement of these markers, alone or in combination, inpatient samples provides information that the diagnostician cancorrelate with a probable diagnosis of lung cancer.

The present invention therefore also relates to diagnostic assays forlung cancer and/or an indicative condition, and methods of use of suchmarkers for detection of lung cancer and/or an indicative condition,optionally and preferably in a sample taken from a subject (patient),which is more preferably some type of blood sample.

In another embodiment, the present invention relates to bridges, tails,heads and/or insertions, and/or analogs, homologs and derivatives ofsuch peptides. Such bridges, tails, heads and/or insertions aredescribed in greater detail below with regard to the Examples.

As used herein a “tail” refers to a peptide sequence at the end of anamino acid sequence that is unique to a splice variant according to thepresent invention. Therefore, a splice variant having such a tail mayoptionally be considered as a chimera, in that at least a first portionof the splice variant is typically highly homologous (often 100%identical) to a portion of the corresponding known protein, while atleast a second portion of the variant comprises the tail.

As used herein a “head” refers to a peptide sequence at the beginning ofan amino acid sequence that is unique to a splice variant according tothe present invention. Therefore, a splice variant having such a headmay optionally be considered as a chimera, in that at least a firstportion of the splice variant comprises the head, while at least asecond portion is typically highly homologous (often 100% identical) toa portion of the corresponding known protein.

As used herein “an edge portion” refers to a connection between twoportions of a splice variant according to the present invention thatwere not joined in the wild type or known protein. An edge mayoptionally arise due to a join between the above “known protein” portionof a variant and the tail, for example, and/or may occur if an internalportion of the wild type sequence is no longer present, such that twoportions of the sequence are now contiguous in the splice variant thatwere not contiguous in the known protein. A “bridge” may optionally bean edge portion as described above, but may also include a join betweena head and a “known protein” portion of a variant, or a join between atail and a “known protein” portion of a variant, or a join between aninsertion and a “known protein” portion of a variant.

Optionally and preferably, a bridge between a tail or a head or a uniqueinsertion, and a “known protein” portion of a variant, comprises atleast about 10 amino acids, more preferably at least about 20 aminoacids, most preferably at least about 30 amino acids, and even morepreferably at least about 40 amino acids, in which at least one aminoacid is from the tail/head/insertion and at least one amino acid is fromthe “known protein” portion of a variant. Also optionally, the bridgemay comprise any number of amino acids from about 10 to about 40 aminoacids (for example, 10, 11, 12, 13 . . . 37, 38, 39, 40 amino acids inlength, or any number in between).

It should be noted that a bridge cannot be extended beyond the length ofthe sequence in either direction, and it should be assumed that everybridge description is to be read in such manner that the bridge lengthdoes not extend beyond the sequence itself.

Furthermore, bridges are described with regard to a sliding window incertain contexts below. For example, certain descriptions of the bridgesfeature the following format: a bridge between two edges (in which aportion of the known protein is not present in the variant) mayoptionally be described as follows: a bridge portion of CONTIG-NAME_P1(representing the name of the protein), comprising a polypeptide havinga length “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise XX (2 amino acids inthe center of the bridge, one from each end of the edge), having astructure as follows (numbering according to the sequence ofCONTIG-NAME_P1): a sequence starting from any of amino acid numbers 49−xto 49 (for example); and ending at any of amino acid numbers50+((n−2)−x) (for example), in which x varies from 0 to n−2. In thisexample, it should also be read as including bridges in which n is anynumber of amino acids between 10-50 amino acids in length. Furthermore,the bridge polypeptide cannot extend beyond the sequence, so it shouldbe read such that 49−x (for example) is not less than 1, nor50+((n−2)−x) (for example) greater than the total sequence length.

In another embodiment, this invention provides antibodies specificallyrecognizing the splice variants and polypeptide fragments thereof ofthis invention. Preferably such antibodies differentially recognizesplice variants of the present invention but do not recognize acorresponding known protein (such known proteins are discussed withregard to their splice variants in the Examples below).

In another embodiment, this invention provides an isolated nucleic acidmolecule encoding for a splice variant according to the presentinvention, having a nucleotide sequence as set forth in any one of thesequences listed herein, or a sequence complementary thereto. In,another embodiment, this invention provides an isolated nucleic acidmolecule, having a nucleotide sequence as set forth in any one of thesequences listed herein, or a sequence complementary thereto. In anotherembodiment, this invention provides an oligonucleotide of at least about12 nucleotides, specifically hybridizable with the nucleic acidmolecules of this invention. In another embodiment, this inventionprovides vectors, cells, liposomes and compositions comprising theisolated nucleic acids of this invention.

In another embodiment, this invention provides a method for detecting asplice variant according to the present invention in a biologicalsample, comprising: contacting a biological sample with an antibodyspecifically recognizing a splice variant according to the presentinvention under conditions whereby the antibody specifically interactswith the splice variant in the biological sample but do not recognizeknown corresponding proteins (wherein the known protein is discussedwith regard to its splice variant(s) in the Examples below), anddetecting said interaction; wherein the presence of an interactioncorrelates with the presence of a splice variant in the biologicalsample.

In another embodiment, this invention provides a method for detecting asplice variant nucleic acid sequences in a biological sample,comprising: hybridizing the isolated nucleic acid molecules oroligonucleotide fragments of at least about a minimum length to anucleic acid material of a biological sample and detecting ahybridization complex; wherein the presence of a hybridization complexcorrelates with the presence of a splice variant nucleic acid sequencein the biological sample.

According to the present invention, the splice variants described hereinare non-limiting examples of markers for diagnosing lung cancer. Eachsplice variant marker of the present invention can be used alone or incombination, for various uses, including but not limited to, prognosis,prediction, screening, early diagnosis, determination of progression,therapy selection and treatment monitoring of lung cancer.

According to optional but preferred embodiments of the presentinvention, any marker according to the present invention may optionallybe used alone or combination. Such a combination may optionally comprisea plurality of markers described herein, optionally including anysubcombination of markers, and/or a combination featuring at least oneother marker, for example a known marker. Furthermore, such acombination may optionally and preferably be used as described abovewith regard to determining a ratio between a quantitative orsemi-quantitative measurement of any marker described herein to anyother marker described herein, and/or any other known marker, and/or anyother marker. With regard to such a ratio between any marker describedherein (or a combination thereof) and a known marker, more preferablythe known marker comprises the “known protein” as described in greaterdetail below with regard to each cluster or gene.

According to other preferred embodiments of the present invention, asplice variant protein or a fragment thereof, or a splice variantnucleic acid sequence or a fragment thereof, may be featured as abiomarker for detecting lung cancer, such that a biomarker mayoptionally comprise any of the above.

According to still other preferred embodiments, the present inventionoptionally and preferably encompasses any amino acid sequence orfragment thereof encoded by a nucleic acid sequence corresponding to asplice variant protein as described herein. Any oligopeptide or peptiderelating to such an amino acid sequence or fragment thereof mayoptionally also (additionally or alternatively) be used as a biomarker,including but not limited to the unique amino acid sequences of theseproteins that are depicted as tails, heads, insertions, edges orbridges. The present invention also optionally encompasses antibodiescapable of recognizing, and/or being elicited by, such oligopeptides orpeptides.

The present invention also optionally and preferably encompasses anynucleic acid sequence or fragment thereof, or amino acid sequence orfragment thereof, corresponding to a splice variant of the presentinvention as described above, optionally for any application.

Non-limiting examples of methods or assays are described below.

The present invention also relates to kits based upon such diagnosticmethods or assays.

Nucleic Acid Sequences and Oligonucleotides

Various embodiments of the present invention encompass nucleic acidsequences described hereinabove; fragments thereof, sequenceshybridizable therewith, sequences homologous thereto, sequences encodingsimilar polypeptides with different codon usage, altered sequencescharacterized by mutations, such as deletion, insertion or substitutionof one or more nucleotides, either naturally occurring or artificiallyinduced, either randomly or in a targeted fashion.

The present invention encompasses nucleic acid sequences describedherein; fragments thereof, sequences hybridizable therewith, sequenceshomologous thereto [e.g., at least 50%, at least 55%, at least 60%, atleast 65%, at least 70%, at least 75%, at least 80%, at least 85%, atleast 95% or more say 100% identical to the nucleic acid sequences setforth below], sequences encoding similar polypeptides with differentcodon usage, altered sequences characterized by mutations, such asdeletion, insertion or substitution of one or more nucleotides, eithernaturally occurring or man induced, either randomly or in a targetedfashion. The present invention also encompasses homologous nucleic acidsequences (i.e., which form a part of a polynucleotide sequence of thepresent invention) which include sequence regions unique to thepolynucleotides of the present invention.

In cases where the polynucleotide sequences of the present inventionencode previously unidentified polypeptides, the present invention alsoencompasses novel polypeptides or portions thereof, which are encoded bythe isolated polynucleotide and respective nucleic acid fragmentsthereof described hereinabove.

A “nucleic acid fragment” or an “oligonucleotide” or a “polynucleotide”are used herein interchangeably to refer to a polymer of nucleic acids.A polynucleotide sequence of the present invention refers to a single ordouble stranded nucleic acid sequences which is isolated and provided inthe form of an RNA sequence, a complementary polynucleotide sequence(cDNA), a genomic polynucleotide sequence and/or a compositepolynucleotide sequences (e.g., a combination of the above).

As used herein the phrase “complementary polynucleotide sequence” refersto a sequence, which results from reverse transcription of messenger RNAusing a reverse transcriptase or any other RNA dependent DNA polymerase.Such a sequence can be subsequently amplified in vivo or in vitro usinga DNA dependent DNA polymerase.

As used herein the phrase “genomic polynucleotide sequence” refers to asequence derived (isolated) from a chromosome and thus it represents acontiguous portion of a chromosome.

As used herein the phrase “composite polynucleotide sequence” refers toa sequence, which is composed of genomic and cDNA sequences. A compositesequence can include some exonal sequences required to encode thepolypeptide of the present invention, as well as some intronic sequencesinterposing therebetween. The intronic sequences can be of any source,including of other genes, and typically will include conserved splicingsignal sequences. Such intronic sequences may further include cis actingexpression regulatory elements.

Preferred embodiments of the present invention encompass oligonucleotideprobes.

An example of an oligonucleotide probe which can be utilized by thepresent invention is a single stranded polynucleotide which includes asequence complementary to the unique sequence region of any variantaccording to the present invention, including but not limited to anucleotide sequence coding for an amino sequence of a bridge, tail, headand/or insertion according to the present invention, and/or theequivalent portions of any nucleotide sequence given herein (includingbut not limited to a nucleotide sequence of a node, segment or amplicondescribed herein).

Alternatively, an oligonucleotide probe of the present invention can bedesigned to hybridize with a nucleic acid sequence encompassed by any ofthe above nucleic acid sequences, particularly the portions specifiedabove, including but not limited to a nucleotide sequence coding for anamino sequence of a bridge, tail, head and/or insertion according to thepresent invention, and/or the equivalent portions of any nucleotidesequence given herein (including but not limited to a nucleotidesequence of a node, segment or amplicon described herein).

Oligonucleotides designed according to the teachings of the presentinvention can be generated according to any oligonucleotide synthesismethod known in the art such as enzymatic synthesis or solid phasesynthesis. Equipment and reagents for executing solid-phase synthesisare commercially available from, for example, Applied Biosystems. Anyother means for such synthesis may also be employed; the actualsynthesis of the oligonucleotides is well within the capabilities of oneskilled in the art and can be accomplished via established methodologiesas detailed in, for example, “Molecular Cloning: A laboratory Manual”Sambrook et al., (1989); “Current Protocols in Molecular Biology”Volumes I-III Ausubel, R. M., ed. (1994); Ausubel et al., “CurrentProtocols in Molecular Biology”, John Wiley and Sons, Baltimore, Md.(1989); Perbal, “A Practical Guide to Molecular Cloning”, John Wiley &Sons, New York (1988) and “Oligonucleotide Synthesis” Gait, M. J., ed.(1984) utilizing solid phase chemistry, e.g. cyanoethyl phosphoramiditefollowed by deprotection, desalting and purification by for example, anautomated trityl-on method or HPLC.

Oligonucleotides used according to this aspect of the present inventionare those having a length selected from a range of about 10 to about 200bases preferably about 15 to about 150 bases, more preferably about 20to about 100 bases, most preferably about 20 to about 50 bases.Preferably, the oligonucleotide of the present invention features atleast 17, at least 18, at least 19, at least 20, at least 22, at least25, at least 30 or at least 40, bases specifically hybridizable with thebiomarkers of the present invention.

The oligonucleotides of the present invention may comprise heterocylicnucleosides consisting of purines and the pyrimidines bases, bonded in a3′ to 5′ phosphodiester linkage.

Preferably used oligonucleotides are those modified at one or more ofthe backbone, internucleoside linkages or bases, as is broadly describedhereinunder.

Specific examples of preferred oligonucleotides useful according to thisaspect of the present invention include oligonucleotides containingmodified backbones or non-natural internucleoside linkages.Oligonucleotides having modified backbones include those that retain aphosphorus atom in the backbone, as disclosed in U.S. Pat. Nos.4,469,863; 4,476,301; 5,023,243; 5,177,196; 5,188,897; 5,264,423;5,276,019; 5,278,302; 5,286,717; 5,321,131; 5,399,676; 5,405,939;5,453,496; 5,455,233; 5,466,677; 5,476,925; 5,519,126; 5,536,821;5,541,306; 5,550,111; 5,563,253; 5,571,799; 5,587,361; and 5,625,050.

Preferred modified oligonucleotide backbones include, for example,phosphorothioates, chiral phosphorothioates, phosphorodithioates,phosphotriesters, aminoalkyl phosphotriesters, methyl and other alkylphosphonates including 3′-alkylene phosphonates and chiral phosphonates,phosphinates, phosphoramidates including 3′-amino phosphoramidate andaminoalkylphosphoramidates, thionophosphoramidates,thionoalkylphosphonates, thionoalkylphosphotriesters, andboranophosphates having normal 3′-5′ linkages, 2′-5′ linked analogs ofthese, and those having inverted polarity wherein the adjacent pairs ofnucleoside units are linked 3′-5′ to 5′-3′ or 2′-5′ to 5′-2′. Varioussalts, mixed salts and free acid forms can also be used.

Alternatively, modified oligonucleotide backbones that do not include aphosphorus atom therein have backbones that are formed by short chainalkyl or cycloalkyl internucleoside linkages, mixed heteroatom and alkylor cycloalkyl internucleoside linkages, or one or more short chainheteroatomic or heterocyclic internucleoside linkages. These includethose having morpholino linkages (formed in part from the sugar portionof a nucleoside); siloxane backbones; sulfide, sulfoxide and sulfonebackbones; formacetyl and thioformacetyl backbones; methylene formacetyland thioformacetyl backbones; alkene containing backbones; sulfamatebackbones; methyleneimino and methylenehydrazino backbones; sulfonateand sulfonamide backbones; amide backbones; and others having mixed N,O, S and CH₂ component parts, as disclosed in U.S. Pat. Nos. 5,034,506;5,166,315; 5,185,444; 5,214,134; 5,216,141; 5,235,033; 5,264,562;5,264,564; 5,405,938; 5,434,257; 5,466,677; 5,470,967; 5,489,677;5,541,307; 5,561,225; 5,596,086; 5,602,240; 5,610,289; 5,602,240;5,608,046; 5,610,289; 5,618,704; 5,623,070; 5,663,312; 5,633,360;5,677,437; and 5,677,439.

Other oligonucleotides which can be used according to the presentinvention, are those modified in both sugar and the internucleosidelinkage, i.e., the backbone, of the nucleotide units are replaced withnovel groups. The base units are maintained for complementation with theappropriate polynucleotide target. An example for such anoligonucleotide mimetic, includes peptide nucleic acid (PNA). UnitedStates patents that teach the preparation of PNA compounds include, butare not limited to, U.S. Pat. Nos. 5,539,082; 5,714,331; and 5,719,262,each of which is herein incorporated by reference. Other backbonemodifications, which can be used in the present invention are disclosedin U.S. Pat. No. 6,303,374.

Oligonucleotides of the present invention may also include basemodifications or substitutions. As used herein, “unmodified” or“natural” bases include the purine bases adenine (A) and guanine (G),and the pyrimidine bases thymine (T), cytosine (C) and uracil (U).Modified bases include but are not limited to other synthetic andnatural bases such as 5-methylcytosine (5-me-C), 5-hydroxymethylcytosine, xanthine, hypoxanthine, 2-aminoadenine, 6-methyl and otheralkyl derivatives of adenine and guanine, 2-propyl and other alkylderivatives of adenine and guanine, 2-thiouracil, 2-thiothymine and2-thiocytosine, 5-halouracil and cytosine, 5-propynyl uracil andcytosine, 6-azo uracil, cytosine and thymine, 5-uracil (pseudouracil),4-thiouracil, 8-halo, 8-amino, 8-thiol, 8-thioalkyl, 8-hydroxyl andother 8-substituted adenines and guanines, 5-halo particularly 5-bromo,5-trifluoromethyl and other 5-substituted uracils and cytosines,7-methylguanine and 7-methyladenine, 8-azaguanine and 8-azaadenine,7-deazaguanine and 7-deazaadenine and 3-deazaguanine and 3-deazaadenine.Further bases particularly useful for increasing the binding affinity ofthe oligomeric compounds of the invention include 5-substitutedpyrimidines, 6-azapyrimidines and N-2, N-6 and O-6 substituted purines,including 2-aminopropyladenine, 5-propynyluracil and 5-propynylcytosine.5-methylcytosine substitutions have been shown to increase nucleic acidduplex stability by 0.6-1.2° C. and are presently preferred basesubstitutions, even more particularly when combined with2′-O-methoxyethyl sugar modifications.

Another modification of the oligonucleotides of the invention involveschemically linking to the oligonucleotide one or more moieties orconjugates, which enhance the activity, cellular distribution orcellular uptake of the oligonucleotide. Such moieties include but arenot limited to lipid moieties such as a cholesterol moiety, cholic acid,a thioether, e.g., hexyl-S-tritylthiol, a thiocholesterol, an aliphaticchain, e.g., dodecandiol or undecyl residues, a phospholipid, e.g.,di-hexadecyl-rac-glycerol or triethylammonium1,2-di-O-hexadecyl-rac-glycero-3-H-phosphonate, a polyamine or apolyethylene glycol chain, or adamantane acetic acid, a palmityl moiety,or an octadecylamine or hexylamino-carbonyl-oxycholesterol moiety, asdisclosed in U.S. Pat. No. 6,303,374.

It is not necessary for all positions in a given oligonucleotidemolecule to be uniformly modified, and in fact more than one of theaforementioned modifications may be incorporated in a single compound oreven at a single nucleoside within an oligonucleotide.

It will be appreciated that oligonucleotides of the present inventionmay include further modifications for more efficient use as diagnosticagents and/or to increase bioavailability, therapeutic efficacy andreduce cytotoxicity.

To enable cellular expression of the polynucleotides of the presentinvention, a nucleic acid construct according to the present inventionmay be used, which includes at least a coding region of one of the abovenucleic acid sequences, and further includes at least one cis actingregulatory element. As used herein, the phrase “cis acting regulatoryelement” refers to a polynucleotide sequence, preferably a promoter,which binds a trans acting regulator and regulates the transcription ofa coding sequence located downstream thereto.

Any suitable promoter sequence can be used by the nucleic acid constructof the present invention.

Preferably, the promoter utilized by the nucleic acid construct of thepresent invention is active in the specific cell population transformed.Examples of cell type-specific and/or tissue-specific promoters includepromoters such as albumin that is liver specific, lymphoid specificpromoters [Calame et al., (1988) Adv. Immunol. 43:235-275]; inparticular promoters of T-cell receptors [Winoto et al., (1989) EMBO J.8:729-733] and immunoglobulins; [Banerji et al. (1983) Cell 33729-740],neuron-specific promoters such as the neurofilament promoter [Byrne etal. (1989) Proc. Natl. Acad. Sci. USA 86:5473-5477], pancreas-specificpromoters [Edlunch et al. (1985) Science 230:912-916] or mammarygland-specific promoters such as the milk whey promoter (U.S. Pat. No.4,873,316 and European Application Publication No. 264,166). The nucleicacid construct of the present invention can further include an enhancer,which can be adjacent or distant to the promoter sequence and canfunction in up regulating the transcription therefrom.

The nucleic acid construct of the present invention preferably furtherincludes an appropriate selectable marker and/or an origin ofreplication. Preferably, the nucleic acid construct utilized is ashuttle vector, which can propagate both in E. coli (wherein theconstruct comprises an appropriate selectable marker and origin ofreplication) and be compatible for propagation in cells, or integrationin a gene and a tissue of choice. The construct according to the presentinvention can be, for example, a plasmid, a bacmid, a phagemid, acosmid, a phage, a virus or an artificial chromosome.

Examples of suitable constructs include, but are not limited to, pcDNA3,pcDNA3.1 (+/−), pGL3, PzeoSV2 (+/−), pDisplay, pEF/myc/cyto,pCMV/myc/cyto each of which is commercially available from InvitrogenCo. (dot invitrogen dot com). Examples of retroviral vector andpackaging systems are those sold by Clontech, San Diego, Calif.,including Retro-X vectors pLNCX and pLXSN, which permit cloning intomultiple cloning sites and the trasgene is transcribed from CMVpromoter. Vectors derived from Mo-MuLV are also included such as pBabe,where the transgene will be transcribed from the 5′LTR promoter.

Currently preferred in vivo nucleic acid transfer techniques includetransfection with viral or non-viral constructs, such as adenovirus,lentivirus, Herpes simplex I virus, or adeno-associated virus (AAV) andlipid-based systems. Useful lipids for lipid-mediated transfer of thegene are, for example, DOTMA, DOPE, and DC-Chol [Tonkinson et al.,Cancer Investigation, 14(1): 54-65 (1996)]. The most preferredconstructs for use in gene therapy are viruses, most preferablyadenoviruses, AAV, lentiviruses, or retroviruses. A viral construct suchas a retroviral construct includes at least one transcriptionalpromoter/enhancer or locus-defining element(s), or other elements thatcontrol gene expression by other means such as alternate splicing,nuclear RNA export, or post-translational modification of messenger.Such vector constructs also include a packaging signal, long terminalrepeats (LTRs) or portions thereof, and positive and negative strandprimer binding sites appropriate to the virus used, unless it is alreadypresent in the viral construct. In addition, such a construct typicallyincludes a signal sequence for secretion of the peptide from a host cellin which it is placed. Preferably the signal sequence for this purposeis a mammalian signal sequence or the signal sequence of the polypeptidevariants of the present invention. Optionally, the construct may alsoinclude a signal that directs polyadenylation, as well as one or morerestriction sites and a translation termination sequence. By way ofexample, such constructs will typically include a 5′ LTR, a tRNA bindingsite, a packaging signal, an origin of second-strand DNA synthesis, anda 3′ LTR or a portion thereof. Other vectors can be used that arenon-viral, such as cationic lipids, polylysine, and dendrimers.

Hybridization Assays

Detection of a nucleic acid of interest in a biological sample mayoptionally be effected by hybridization-based assays using anoligonucleotide probe (non-limiting examples of probes according to thepresent invention were previously described).

Traditional hybridization assays include PCR, RT-PCR, Real-time PCR,RNase protection, in-situ hybridization, primer extension, Southernblots (DNA detection), dot or slot blots (DNA, RNA), and Northern blots(RNA detection) (NAT type assays are described in greater detail below).More recently, PNAs have been described (Nielsen et al. 1999, CurrentOpin. Biotechnol. 10:71-75). Other detection methods include kitscontaining probes on a dipstick setup and the like.

Hybridization based assays which allow the detection of a variant ofinterest (i.e., DNA or RNA) in a biological sample rely on the use ofoligonucleotides which can be 10, 15, 20, or 30 to 100 nucleotides longpreferably from 10 to 50, more preferably from 40 to 50 nucleotideslong.

Thus, the isolated polynucleotides (oligonucleotides) of the presentinvention are preferably hybridizable with any of the herein describednucleic acid sequences under moderate to stringent hybridizationconditions.

Moderate to stringent hybridization conditions are characterized by ahybridization solution such as containing 10% dextrane sulfate, 1 MNaCl, 1% SDS and 5×10⁶ cpm ³²P labeled probe, at 65° C., with a finalwash solution of 0.2×SSC and 0.1% SDS and fmal wash at 65° C. andwhereas moderate hybridization is effected using a hybridizationsolution containing 10% dextrane sulfate, 1 M NaCl, 1% SDS and 5×10⁶ cpm³²P labeled probe, at 65° C., with a final wash solution of 1×SSC and0.1% SDS and final wash at 50° C.

More generally, hybridization of short nucleic acids (below 200 bp inlength, e.g. 17-40 bp in length) can be effected using the followingexemplary hybridization protocols which can be modified according to thedesired stringency; (i) hybridization solution of 6×SSC and 1% SDS or 3M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5% SDS,100 μg/ml denatured salmon sperm DNA and 0.1% nonfat dried milk,hybridization temperature of 1-1.5° C. below the T_(m), final washsolution of 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH7.6), 0.5% SDS at 1-1.5° C. below the T_(m); (ii) hybridization solutionof 6×SSC and 0.1% SDS or 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1mM EDTA (pH 7.6), 0.5% SDS, 100 μg/ml denatured salmon sperm DNA and0.1% nonfat dried milk, hybridization temperature of 2-2.5° C. below theT_(m), final wash solution of 3 M TMACI, 0.01 M sodium phosphate (pH6.8), 1 mM EDTA (pH 7.6), 0.5% SDS at 1-1.5° C. below the T_(m), finalwash solution of 6×SSC, and final wash at 22° C.; (iii) hybridizationsolution of 6×SSC and 1% SDS or 3 M TMACI, 0.01 M sodium phosphate (pH6.8), 1 mM EDTA (pH 7.6), 0.5% SDS, 100 μg/ml denatured salmon sperm DNAand 0.1% nonfat dried milk, hybridization temperature.

The detection of hybrid duplexes can be carried out by a number ofmethods. Typically, hybridization duplexes are separated fromunhybridized nucleic acids and the labels bound to the duplexes are thendetected. Such labels refer to radioactive, fluorescent, biological orenzymatic tags or labels of standard use in the art. A label can beconjugated to either the oligonucleotide probes or the nucleic acidsderived from the biological sample.

Probes can be labeled according to numerous well known methods.Non-limiting examples of radioactive labels include 3H, 14C, 32P, and35S. Non-limiting examples of detectable markers include ligands,fluorophores, chemiluminescent agents, enzymes, and antibodies. Otherdetectable markers for use with probes, which can enable an increase insensitivity of the method of the invention, include biotin andradio-nucleotides. It will become evident to the person of ordinaryskill that the choice of a particular label dictates the manner in whichit is bound to the probe.

For example, oligonucleotides of the present invention can be labeledsubsequent to synthesis, by incorporating biotinylated dNTPs or rNTP, orsome similar means (e.g., photo-cross-linking a psoralen derivative ofbiotin to RNAs), followed by addition of labeled streptavidin (e.g.,phycoerythrin-conjugated streptavidin) or the equivalent. Alternatively,when fluorescently-labeled oligonucleotide probes are used, fluorescein,lissamine, phycoerythrin, rhodamine (Perkin Elmer Cetus), Cy2, Cy3,Cy3.5, Cy5, Cy5.5, Cy7, FluorX (Amersham) and others [e.g., Kricka etal. (1992), Academic Press San Diego, Calif.] can be attached to theoligonucleotides.

Those skilled in the art will appreciate that wash steps may be employedto wash away excess target DNA or probe as well as unbound conjugate.Further, standard heterogeneous assay formats are suitable for detectingthe hybrids using the labels present on the oligonucleotide primers andprobes.

It will be appreciated that a variety of controls may be usefullyemployed to improve accuracy of hybridization assays. For instance,samples may be hybridized to an irrelevant probe and treated with RNAseA prior to hybridization, to assess false hybridization.

Although the present invention is not specifically dependent on the useof a label for the detection of a particular nucleic acid sequence, sucha label might be beneficial, by increasing the sensitivity of thedetection. Furthermore, it enables automation. Probes can be labeledaccording to numerous well known methods.

As commonly known, radioactive nucleotides can be incorporated intoprobes of the invention by several methods. Non-limiting examples ofradioactive labels include ³H, ¹⁴C, ³²P, and ³⁵S.

Those skilled in the art will appreciate that wash steps may be employedto wash away excess target DNA or probe as well as unbound conjugate.Further, standard heterogeneous assay formats are suitable for detectingthe hybrids using the labels present on the oligonucleotide primers andprobes.

It will be appreciated that a variety of controls may be usefullyemployed to improve accuracy of hybridization assays.

Probes of the invention can be utilized with naturally occurringsugar-phosphate backbones as well as modified backbones includingphosphorothioates, dithionates, alkyl phosphonates and a-nucleotides andthe like. Probes of the invention can be constructed of eitherribonucleic acid (RNA) or deoxyribonucleic acid (DNA), and preferably ofDNA.

NAT Assays

Detection of a nucleic acid of interest in a biological sample may alsooptionally be effected by NAT-based assays, which involve nucleic acidamplification technology, such as PCR for example (or variations thereofsuch as real-time PCR for example).

As used herein, a “primer” defines an oligonucleotide which is capableof annealing to (hybridizing with) a target sequence, thereby creating adouble stranded region which can serve as an initiation point for DNAsynthesis under suitable conditions.

Amplification of a selected, or target, nucleic acid sequence may becarried out by a number of suitable methods. See generally Kwoh et al.,1990, Am. Biotechnol. Lab. 8:14 Numerous amplification techniques havebeen described and can be readily adapted to suit particular needs of aperson of ordinary skill. Non-limiting examples of amplificationtechniques include polymerase chain reaction (PCR), ligase chainreaction (LCR), strand displacement amplification (SDA),transcription-based amplification, the q3 replicase system and NASBA(Kwoh et al., 1989, Proc. NatI. Acad. Sci. USA 86, 1173-1177; Lizardi etal., 1988, BioTechnology 6:1197-1202; Malek et al., 1994, Methods Mol.Biol., 28:253-260; and Sambrook et al., 1989, supra).

The terminology “amplification pair” (or “primer pair”) refers herein toa pair of oligonucleotides (oligos) of the present invention, which areselected to be used together in amplifying a selected nucleic acidsequence by one of a number of types of amplification processes,preferably a polymerase chain reaction. Other types of amplificationprocesses include ligase chain reaction, strand displacementamplification, or nucleic acid sequence-based amplification, asexplained in greater detail below. As commonly known in the art, theoligos are designed to bind to a complementary sequence under selectedconditions.

In one particular embodiment, amplification of a nucleic acid samplefrom a patient is amplified under conditions which favor theamplification of the most abundant differentially expressed nucleicacid. In one preferred embodiment, RT-PCR is carried out on an mRNAsample from a patient under conditions which favor the amplification ofthe most abundant mRNA. In another preferred embodiment, theamplification of the differentially expressed nucleic acids is carriedout simultaneously. It will be realized by a person skilled in the artthat such methods could be adapted for the detection of differentiallyexpressed proteins instead of differentially expressed nucleic acidsequences. The nucleic acid (i.e. DNA or RNA) for practicing the presentinvention may be obtained according to well known methods.

Oligonucleotide primers of the present invention may be of any suitablelength, depending on the particular assay format and the particularneeds and targeted genomes employed. Optionally, the oligonucleotideprimers are at least 12 nucleotides in length, preferably between 15 and24 molecules, and they may be adapted to be especially suited to achosen nucleic acid amplification system. As commonly known in the art,the oligonucleotide primers can be designed by taking into considerationthe melting point of hybridization thereof with its targeted sequence(Sambrook et al., 1989, Molecular Cloning—A Laboratory Manual, 2ndEdition, CSH Laboratories; Ausubel et al., 1989, in Current Protocols inMolecular Biology, John Wiley & Sons Inc., N.Y.).

It will be appreciated that antisense oligonucleotides may be employedto quantify expression of a splice isoform of interest. Such detectionis effected at the pre-mRNA level. Essentially the ability to quantitatetranscription from a splice site of interest can be effected based onsplice site accessibility. Oligonucleotides may compete with splicingfactors for the splice site sequences. Thus, low activity of theantisense oligonucleotide is indicative of splicing activity.

The polymerase chain reaction and other nucleic acid amplificationreactions are well known in the art (various non-limiting examples ofthese reactions are described in greater detail below). The pair ofoligonucleotides according to this aspect of the present invention arepreferably selected to have compatible melting temperatures (Tm), e.g.,melting temperatures which differ by less than that 7° C., preferablyless than 5° C., more preferably less than 4° C., most preferably lessthan 3° C., ideally between 3° C. and 0° C.

Polymerase Chain Reaction (PCR): The polymerase chain reaction (PCR), asdescribed in U.S. Pat. Nos. 4,683,195 and 4,683,202 to Mullis and Mulliset al., is a method of increasing the concentration of a segment oftarget sequence in a mixture of genomic DNA without cloning orpurification. This technology provides one approach to the problems oflow target sequence concentration. PCR can be used to directly increasethe concentration of the target to an easily detectable level. Thisprocess for amplifying the target sequence involves the introduction ofa molar excess of two oligonucleotide primers which are complementary totheir respective strands of the double-stranded target sequence to theDNA mixture containing the desired target sequence. The mixture isdenatured and then allowed to hybridize. Following hybridization, theprimers are extended with polymerase so as to form complementarystrands. The steps of denaturation, hybridization (annealing), andpolymerase extension (elongation) can be repeated as often as needed, inorder to obtain relatively high concentrations of a segment of thedesired target sequence.

The length of the segment of the desired target sequence is determinedby the relative positions of the primers with respect to each other,and, therefore, this length is a controllable parameter. Because thedesired segments of the target sequence become the dominant sequences(in terms of concentration) in the mixture, they are said to be“PCR-amplified.”

Ligase Chain Reaction (LCR or LAR): The ligase chain reaction [LCR;sometimes referred to as “Ligase Amplification Reaction” (LAR)] hasdeveloped into a well-recognized alternative method of amplifyingnucleic acids. In LCR, four oligonucleotides, two adjacentoligonucleotides which uniquely hybridize to one strand of target DNA,and a complementary set of adjacent oligonucleotides, which hybridize tothe opposite strand are mixed and DNA ligase is added to the mixture.Provided that there is complete complementarity at the junction, ligasewill covalently link each set of hybridized molecules. Importantly, inLCR, two probes are ligated together only when they base-pair withsequences in the target sample, without gaps or mismatches. Repeatedcycles of denaturation, and ligation amplify a short segment of DNA. LCRhas also been used in combination with PCR to achieve enhanced detectionof single-base changes: see for example Segev, PCT Publication No.W09001069 A1 (1990). However, because the four oligonucleotides used inthis assay can pair to form two short ligatable fragments, there is thepotential for the generation of target-independent background signal.The use of LCR for mutant screening is limited to the examination ofspecific nucleic acid positions.

Self-Sustained Synthetic Reaction (3SR/NASBA): The self-sustainedsequence replication reaction (3SR) is a transcription-based in vitroamplification system that can exponentially amplify RNA sequences at auniform temperature. The amplified RNA can then be utilized for mutationdetection. In this method, an oligonucleotide primer is used to add aphage RNA polymerase promoter to the 5′ end of the sequence of interest.In a cocktail of enzymes and substrates that includes a second primer,reverse transcriptase, RNase H, RNA polymerase and ribo- anddeoxyribonucleoside triphosphates, the target sequence undergoesrepeated rounds of transcription, cDNA synthesis and second-strandsynthesis to amplify the area of interest. The use of 3SR to detectmutations is kinetically limited to screening small segments of DNA(e.g., 200-300 base pairs).

Q-Beta (Qβ) Replicase: In this method, a probe which recognizes thesequence of interest is attached to the replicatable RNA template for Qβreplicase. A previously identified major problem with false positivesresulting from the replication of unhybridized probes has been addressedthrough use of a sequence-specific ligation step. However, availablethermostable DNA ligases are not effective on this RNA substrate, so theligation must be performed by T4 DNA ligase at low temperatures (37degrees C.). This prevents the use of high temperature as a means ofachieving specificity as in the LCR, the ligation event can be used todetect a mutation at the junction site, but not elsewhere.

A successful diagnostic method must be very specific. A straight-forwardmethod of controlling the specificity of nucleic acid hybridization isby controlling the temperature of the reaction. While the 3SR/NASBA, andQβ systems are all able to generate a large quantity of signal, one ormore of the enzymes involved in each cannot be used at high temperature(i.e., >55 degrees C.). Therefore the reaction temperatures cannot beraised to prevent non-specific hybridization of the probes. If probesare shortened in order to make them melt more easily at lowtemperatures, the likelihood of having more than one perfect match in acomplex genome increases. For these reasons, PCR and LCR currentlydominate the research field in detection technologies.

The basis of the amplification procedure in the PCR and LCR is the factthat the products of one cycle become usable templates in all subsequentcycles, consequently doubling the population with each cycle. The finalyield of any such doubling system can be expressed as: (1+X)^(n)=y,where “X” is the mean efficiency (percent copied in each cycle), “n” isthe number of cycles, and “y” is the overall efficiency, or yield of thereaction. If every copy of a target DNA is utilized as a template inevery cycle of a polymerase chain reaction, then the mean efficiency is100%. If 20 cycles of PCR are performed, then the yield will be 2²⁰, or1,048,576 copies of the starting material. If the reaction conditionsreduce the mean efficiency to 85%, then the yield in those 20 cycleswill be only 1.85²⁰, or 220,513 copies of the starting material. Inother words, a PCR running at 85% efficiency will yield only 21% as muchfinal product, compared to a reaction running at 100% efficiency. Areaction that is reduced to 50% mean efficiency will yield less than 1%of the possible product.

In practice, routine polymerase chain reactions rarely achieve thetheoretical maximum yield, and PCRs are usually run for more than 20cycles to compensate for the lower yield. At 50% mean efficiency, itwould take 34 cycles to achieve the million-fold amplificationtheoretically possible in 20, and at lower efficiencies, the number ofcycles required becomes prohibitive. In addition, any backgroundproducts that amplify with a better mean efficiency than the intendedtarget will become the dominant products.

Also, many variables can influence the mean efficiency of PCR, includingtarget DNA length and secondary structure, primer length and design,primer and dNTP concentrations, and buffer composition, to name but afew. Contamination of the reaction with exogenous DNA (e.g., DNA spilledonto lab surfaces) or cross-contamination is also a major consideration.Reaction conditions must be carefully optimized for each differentprimer pair and target sequence, and the process can take days, even foran experienced investigator. The laboriousness of this process,including numerous technical considerations and other factors, presentsa significant drawback to using PCR in the clinical setting. Indeed, PCRhas yet to penetrate the clinical market in a significant way. The sameconcerns arise with LCR, as LCR must also be optimized to use differentoligonucleotide sequences for each target sequence. In addition, bothmethods require expensive equipment, capable of precise temperaturecycling.

Many applications of nucleic acid detection technologies, such as instudies of allelic variation, involve not only detection of a specificsequence in a complex background, but also the discrimination betweensequences with few, or single, nucleotide differences. One method of thedetection of allele-specific variants by PCR is based upon the fact thatit is difficult for Taq polymerase to synthesize a DNA strand when thereis a mismatch between the template strand and the 3′ end of the primer.An allele-specific variant may be detected by the use of a primer thatis perfectly matched with only one of the possible alleles; the mismatchto the other allele acts to prevent the extension of the primer, therebypreventing the amplification of that sequence. This method has asubstantial limitation in that the base composition of the mismatchinfluences the ability to prevent extension across the mismatch, andcertain mismatches do not prevent extension or have only a minimaleffect.

A similar 3′-mismatch strategy is used with greater effect to preventligation in the LCR. Any mismatch effectively blocks the action of thethermostable ligase, but LCR still has the drawback oftarget-independent background ligation products initiating theamplification. Moreover, the combination of PCR with subsequent LCR toidentify the nucleotides at individual positions is also a clearlycumbersome proposition for the clinical laboratory.

The direct detection method according to various preferred embodimentsof the present invention may be, for example a cycling probe reaction(CPR) or a branched DNA analysis.

When a sufficient amount of a nucleic acid to be detected is available,there are advantages to detecting that sequence directly, instead ofmaking more copies of that target, (e.g., as in PCR and LCR). Mostnotably, a method that does not amplify the signal exponentially is moreamenable to quantitative analysis. Even if the signal is enhanced byattaching multiple dyes to a single oligonucleotide, the correlationbetween the final signal intensity and amount of target is direct. Sucha system has an additional advantage that the products of the reactionwill not themselves promote further reaction, so contamination of labsurfaces by the products is not as much of a concern. Recently devisedtechniques have sought to eliminate the use of radioactivity and/orimprove the sensitivity in automatable formats. Two examples are the“Cycling Probe Reaction” (CPR), and “Branched DNA” (bDNA).

Cycling probe reaction (CPR): The cycling probe reaction (CPR), uses along chimeric oligonucleotide in which a central portion is made of RNAwhile the two termini are made of DNA. Hybridization of the probe to atarget DNA and exposure to a thermostable RNase H causes the RNA portionto be digested. This destabilizes the remaining DNA portions of theduplex, releasing the remainder of the probe from the target DNA andallowing another probe molecule to repeat the process. The signal, inthe form of cleaved probe molecules, accumulates at a linear rate. Whilethe repeating process increases the signal, the RNA portion of theoligonucleotide is vulnerable to RNases that may carried through samplepreparation.

Branched DNA: Branched DNA (bDNA), involves oligonucleotides withbranched structures that allow each individual oligonucleotide to carry35 to 40 labels (e.g., alkaline phosphatase enzymes). While thisenhances the signal from a hybridization event, signal from non-specificbinding is similarly increased.

The detection of at least one sequence change according to variouspreferred embodiments of the present invention may be accomplished by,for example restriction fragment length polymorphism (RFLP analysis),allele specific oligonucleotide (ASO) analysis, Denaturing/TemperatureGradient Gel Electrophoresis (DGGE/TGGE), Single-Strand ConformationPolymorphism (SSCP) analysis or Dideoxy fingerprinting (ddF).

The demand for tests which allow the detection of specific nucleic acidsequences and sequence changes is growing rapidly in clinicaldiagnostics. As nucleic acid sequence data for genes from humans andpathogenic organisms accumulates, the demand for fast, cost-effective,and easy-to-use tests for as yet mutations within specific sequences israpidly increasing.

A handful of methods have been devised to scan nucleic acid segments formutations. One option is to determine the entire gene sequence of eachtest sample (e.g., a bacterial isolate). For sequences underapproximately 600 nucleotides, this may be accomplished using amplifiedmaterial (e.g., PCR reaction products). This avoids the time and expenseassociated with cloning the segment of interest. However, specializedequipment and highly trained personnel are required, and the method istoo labor-intense and expensive to be practical and effective in theclinical setting.

In view of the difficulties associated with sequencing, a given segmentof nucleic acid may be characterized on several other levels. At thelowest resolution, the size of the molecule can be determined byelectrophoresis by comparison to a known standard run on the same gel. Amore detailed picture of the molecule may be achieved by cleavage withcombinations of restriction enzymes prior to electrophoresis, to allowconstruction of an ordered map. The presence of specific sequenceswithin the fragment can be detected by hybridization of a labeled probe,or the precise nucleotide sequence can be determined by partial chemicaldegradation or by primer extension in the presence of chain-terminatingnucleotide analogs.

Restriction fragment length polymorphism (RFLP): For detection ofsingle-base differences between like sequences, the requirements of theanalysis are often at the highest level of resolution. For cases inwhich the position of the nucleotide in question is known in advance,several methods have been developed for examining single base changeswithout direct sequencing. For example, if a mutation of interesthappens to fall within a restriction recognition sequence, a change inthe pattern of digestion can be used as a diagnostic tool (e.g.,restriction fragment length polymorphism [RFLP] analysis).

Single point mutations have been also detected by the creation ordestruction of RFLPs. Mutations are detected and localized by thepresence and size of the RNA fragments generated by cleavage at themismatches. Single nucleotide mismatches in DNA heteroduplexes are alsorecognized and cleaved by some chemicals, providing an alternativestrategy to detect single base substitutions, generically named the“Mismatch Chemical Cleavage” (MCC). However, this method requires theuse of osmium tetroxide and piperidine, two highly noxious chemicalswhich are not suited for use in a clinical laboratory.

RFLP analysis suffers from low sensitivity and requires a large amountof sample. When RFLP analysis is used for the detection of pointmutations, it is, by its nature, limited to the detection of only thosesingle base changes which fall within a restriction sequence of a knownrestriction endonuclease. Moreover, the majority of the availableenzymes have 4 to 6 base-pair recognition sequences, and cleave toofrequently for many large-scale DNA manipulations. Thus, it isapplicable only in a small fraction of cases, as most mutations do notfall within such sites.

A handful of rare-cutting restriction enzymes with 8 base-pairspecificities have been isolated and these are widely used in geneticmapping, but these enzymes are few in number, are limited to therecognition of G+C-rich sequences, and cleave at sites that tend to behighly clustered. Recently, endonucleases encoded by group I intronshave been discovered that might have greater than 12 base-pairspecificity, but again, these are few in number.

Allele specific oligonucleotide (ASO): If the change is not in arecognition sequence, then allele-specific oligonucleotides (ASOs), canbe designed to hybridize in proximity to the mutated nucleotide, suchthat a primer extension or ligation event can bused as the indicator ofa match or a mis-match. Hybridization with radioactively labeled allelicspecific oligonucleotides (ASO) also has been applied to the detectionof specific point mutations. The method is based on the differences inthe melting temperature of short DNA fragments differing by a singlenucleotide. Stringent hybridization and washing conditions candifferentiate between mutant and wild-type alleles. The ASO approachapplied to PCR products also has been extensively utilized by variousresearchers to detect and characterize point mutations in ras genes andgsp/gip oncogenes. Because of the presence of various nucleotide changesin multiple positions, the ASO method requires the use of manyoligonucleotides to cover all possible oncogenic mutations.

With either of the techniques described above (i.e., RFLP and ASO), theprecise location of the suspected mutation must be known in advance ofthe test. That is to say, they are inapplicable when one needs to detectthe presence of a mutation within a gene or sequence of interest.

Denaturing/Temperature Gradient Gel Electrophoresis (DGGE/TGGE): Twoother methods rely on detecting changes in electrophoretic mobility inresponse to minor sequence changes. One of these methods, termed“Denaturing Gradient Gel Electrophoresis” (DGGE) is based on theobservation that slightly different sequences will display differentpatterns of local melting when electrophoretically resolved on agradient gel. In this manner, variants can be distinguished, asdifferences in melting properties of homoduplexes versus heteroduplexesdiffering in a single nucleotide can detect the presence of mutations inthe target sequences because of the corresponding changes in theirelectrophoretic mobilities. The fragments to be analyzed, usually PCRproducts, are “clamped” at one end by a long stretch of G−C base pairs(30-80) to allow complete denaturation of the sequence of interestwithout complete dissociation of the strands. The attachment of a GC“clamp” to the DNA fragments increases the fraction of mutations thatcan be recognized by DGGE. Attaching a GC clamp to one primer iscritical to ensure that the amplified sequence has a low dissociationtemperature. Modifications of the technique have been developed, usingtemperature gradients, and the method can be also applied to RNA:RNAduplexes.

Limitations on the utility of DGGE include the requirement that thedenaturing conditions must be optimized for each type of DNA to betested. Furthermore, the method requires specialized equipment toprepare the gels and maintain the needed high temperatures duringelectrophoresis. The expense associated with the synthesis of theclamping tail on one oligonucleotide for each sequence to be tested isalso a major consideration. In addition, long running times are requiredfor DGGE. The long running time of DGGE was shortened in a modificationof DGGE called constant denaturant gel electrophoresis (CDGE). CDGErequires that gels be performed under different denaturant conditions inorder to reach high efficiency for the detection of mutations.

A technique analogous to DGGE, termed temperature gradient gelelectrophoresis (TGGE), uses a thermal gradient rather than a chemicaldenaturant gradient. TGGE requires the use of specialized equipmentwhich can generate a temperature gradient perpendicularly orientedrelative to the electrical field. TGGE can detect mutations inrelatively small fragments of DNA therefore scanning of large genesegments requires the use of multiple PCR products prior to running thegel.

Single-Strand Conformation Polymorphism (SSCP): Another common method,called “Single-Strand Conformation Polymorphism” (SSCP) was developed byHayashi, Sekya and colleagues and is based on the observation thatsingle strands of nucleic acid can take on characteristic conformationsin non-denaturing conditions, and these conformations influenceelectrophoretic mobility. The complementary strands assume sufficientlydifferent structures that one strand may be resolved from the other.Changes in sequences within the fragment will also change theconformation, consequently altering the mobility and allowing this to beused as an assay for sequence variations.

The SSCP process involves denaturing a DNA segment (e.g., a PCR product)that is labeled on both strands, followed by slow electrophoreticseparation on a non-denaturing polyacrylamide gel, so thatintra-molecular interactions can form and not be disturbed during therun. This technique is extremely sensitive to variations in gelcomposition and temperature. A serious limitation of this method is therelative difficulty encountered in comparing data generated in differentlaboratories, under apparently similar conditions.

Dideoxy fingerprinting (ddF): The dideoxy fingerprinting (ddF) isanother technique developed to scan genes for the presence of mutations.The ddF technique combines components of Sanger dideoxy sequencing withSSCP. A dideoxy sequencing reaction is performed using one dideoxyterminator and then the reaction products are electrophoresed onnondenaturing polyacrylamide gels to detect alterations in mobility ofthe termination segments as in SSCP analysis. While ddF is animprovement over SSCP in terms of increased sensitivity, ddF requiresthe use of expensive dideoxynucleotides and this technique is stilllimited to the analysis of fragments of the size suitable for SSCP(i.e., fragments of 200-300 bases for optimal detection of mutations).

In addition to the above limitations, all of these methods are limitedas to the size of the nucleic acid fragment that can be analyzed. Forthe direct sequencing approach, sequences of greater than 600 base pairsrequire cloning, with the consequent delays and expense of eitherdeletion sub-cloning or primer walking, in order to cover the entirefragment. SSCP and DGGE have even more severe size limitations. Becauseof reduced sensitivity to sequence changes, these methods are notconsidered suitable for larger fragments. Although SSCP is reportedlyable to detect 90% of single-base substitutions within a 200 base-pairfragment, the detection drops to less than 50% for 400 base pairfragments. Similarly, the sensitivity of DGGE decreases as the length ofthe fragment reaches 500 base-pairs. The ddF technique, as a combinationof direct sequencing and SSCP, is also limited by the relatively smallsize of the DNA that can be screened.

According to a presently preferred embodiment of the present inventionthe step of searching for any of the nucleic acid sequences describedhere, in tumor cells or in cells derived from a cancer patient iseffected by any suitable technique, including, but not limited to,nucleic acid sequencing, polymerase chain reaction, ligase chainreaction, self-sustained synthetic reaction, Qβ-Replicase, cycling probereaction, branched DNA, restriction fragment length polymorphismanalysis, mismatch chemical cleavage, heteroduplex analysis,allele-specific oligonucleotides, denaturing gradient gelelectrophoresis, constant denaturant gel electrophoresis, temperaturegradient gel electrophoresis and dideoxy fingerprinting.

Detection may also optionally be performed with a chip or other suchdevice. The nucleic acid sample which includes the candidate region tobe analyzed is preferably isolated, amplified and labeled with areporter group. This reporter group can be a fluorescent group such asphycoerythrin. The labeled nucleic acid is then incubated with theprobes immobilized on the chip using a fluidics station. describe thefabrication of fluidics devices and particularly microcapillary devices,in silicon and glass substrates.

Once the reaction is completed, the chip is inserted into a scanner andpatterns of hybridization are detected. The hybridization data iscollected, as a signal emitted from the reporter groups alreadyincorporated into the nucleic acid, which is now bound to the probesattached to the chip. Since the sequence and position of each probeimmobilized on the chip is known, the identity of the nucleic acidhybridized to a given probe can be determined.

It will be appreciated that when utilized along with automatedequipment, the above described detection methods can be used to screenmultiple samples for a disease and/or pathological condition bothrapidly and easily.

Amino Acid Sequences and Peptides

The terms “polypeptide,” “peptide” and “protein” are usedinterchangeably herein to refer to a polymer of amino acid residues. Theterms apply to amino acid polymers in which one or more amino acidresidue is an analog or mimetic of a corresponding naturally occurringamino acid, as well as to naturally occurring amino acid polymers.Polypeptides can be modified, e.g., by the addition of carbohydrateresidues to form glycoproteins. The terms “polypeptide,” “peptide” and“protein” include glycoproteins, as well as non-glycoproteins.

Polypeptide products can be biochemically synthesized such as byemploying standard solid phase techniques. Such methods include but arenot limited to exclusive solid phase synthesis, partial solid phasesynthesis methods, fragment condensation, classical solution synthesis.These methods are preferably used when the peptide is relatively short(i.e., 10 kDa) and/or when it cannot be produced by recombinanttechniques (i.e., not encoded by a nucleic acid sequence) and thereforeinvolves different chemistry.

Solid phase polypeptide synthesis procedures are well known in the artand further described by John Morrow Stewart and Janis Dillaha Young,Solid Phase Peptide Syntheses (2nd Ed., Pierce Chemical Company, 1984).

Synthetic polypeptides can optionally be purified by preparative highperformance liquid chromatography [Creighton T. (1983) Proteins,structures and molecular principles. WH Freeman and Co. N.Y.], afterwhich their composition can be confirmed via amino acid sequencing.

In cases where large amounts of a polypeptide are desired, it can begenerated using recombinant techniques such as described by Bitter etal., (1987) Methods in Enzymol. 153:516-544, Studier et al. (1990)Methods in Enzymol. 185:60-89, Brisson et al. (1984) Nature 310:511-514,Takamatsu et al. (1987) EMBO J. 6:307-311, Coruzzi et al. (1984) EMBO J.3:1671-1680 and Brogli et al., (1984) Science 224:838-843, Gurley etal., (1986) Mol. Cell. Biol. 6:559-565 and Weissbach & Weissbach, 1988,Methods for Plant Molecular Biology, Academic Press, NY, Section VIII,pp 421-463.

The present invention also encompasses polypeptides encoded by thepolynucleotide sequences of the present invention, as well aspolypeptides according to the amino acid sequences described herein. Thepresent invention also encompasses homologues of these polypeptides,such homologues can be at least 50%, at least 55%, at least 60%, atleast 65%, at least 70%, at least 75%, at least 80%, at least 85%, atleast 95% or more say 100% homologous to the amino acid sequences setforth below, as can be determined using BlastP software of the NationalCenter of Biotechnology Information (NCBI) using default parameters,optionally and preferably including the following: filtering on (thisoption filters repetitive or low-complexity sequences from the queryusing the Seg (protein) program), scoring matrix is BLOSUM62 forproteins, word size is 3, E value is 10, gap costs are 11, 1(initialization and extension), and number of alignments shown is 50.Optionally, nucleic acid sequence identity/homology may be determined byusing BlastN software of the National Center of BiotechnologyInformation (NCBI) using default parameters, which preferably includeusing the DUST filter program, and also preferably include having an Evalue of 10, filtering low complexity sequences and a word size of 11.Finally, the present invention also encompasses fragments of the abovedescribed polypeptides and polypeptides having mutations, such asdeletions, insertions or substitutions of one or more amino acids,either naturally occurring or artificially induced, either randomly orin a targeted fashion.

It will be appreciated that peptides identified according the presentinvention may be degradation products, synthetic peptides or recombinantpeptides as well as peptidomimetics, typically, synthetic peptides andpeptoids and semipeptoids which are peptide analogs, which may have, forexample, modifications rendering the peptides more stable while in abody or more capable of penetrating into cells. Such modificationsinclude, but are not limited to N terminus modification, C terminusmodification, peptide bond modification, including, but not limited to,CH2-NH, CH2-S, CH2-S═O, O═C—NH, CH2-O, CH2-CH2, S═C—NH, CH═CH or CF═CH,backbone modifications, and residue modification. Methods for preparingpeptidomimetic compounds are well known in the art and are specified.Further details in this respect are provided hereinunder.

Peptide bonds (—CO—NH—) within the peptide may be substituted, forexample, by N-methylated bonds (—N(CH3)-CO—), ester bonds(—C(R)H—C—O—O—C(R)—N—), ketomethylen bonds (—CO—CH2-), α-aza bonds(—NH—N(R)—CO—), wherein R is any alkyl, e.g., methyl, carba bonds(—CH2-NH—), hydroxyethylene bonds (—CH(OH)—CH2-), thioamide bonds(—CS—NH—), olefinic double bonds (—CH═CH—), retro amide bonds (—NH—CO—),peptide derivatives (—N(R)—CH2-CO—), wherein R is the “normal” sidechain, naturally presented on the carbon atom.

These modifications can occur at any of the bonds along the peptidechain and even at several (2-3) at the same time.

Natural aromatic amino acids, Trp, Tyr and Phe, may be substituted forsynthetic non-natural acid such as Phenylglycine, TIC, naphthylelanine(Nol), ring-methylated derivatives of Phe, halogenated derivatives ofPhe or o-methyl-Tyr.

In addition to the above, the peptides of the present invention may alsoinclude one or more modified amino acids or one or more non-amino acidmonomers (e.g. fatty acids, complex carbohydrates etc).

As used herein in the specification and in the claims section below theterm “amino acid” or “amino acids” is understood to include the 20naturally occurring amino acids; those amino acids often modifiedpost-translationally in vivo, including, for example, hydroxyproline,phosphoserine and phosphothreonine; and other unusual amino acidsincluding, but not limited to, 2-aminoadipic acid, hydroxylysine,isodesmosine, nor-valine, nor-leucine and ornithine. Furthermore, theterm “amino acid” includes both D- and L-amino acids.

Table 1 non-conventional or modified amino acids which can be used withthe present invention.

TABLE 1 Non-conventional amino acid Code Non-conventional amino acidCode α-aminobutyric acid Abu L-N-methylalanine Nmalaα-amino-α-methylbutyrate Mgabu L-N-methylarginine Nmargaminocyclopropane- Cpro L-N-methylasparagine Nmasn CarboxylateL-N-methylaspartic acid Nmasp Aminoisobutyric acid AibL-N-methylcysteine Nmcys aminonorbornyl- Norb L-N-methylglutamine NmginCarboxylate L-N-methylglutamic acid Nmglu Cyclohexylalanine ChexaL-N-methylhistidine Nmhis Cyclopentylalanine Cpen L-N-methylisolleucineNmile D-alanine Dal L-N-methylleucine Nmleu D-arginine DargL-N-methyllysine Nmlys D-aspartic acid Dasp L-N-methylmethionine NmmetD-cysteine Dcys L-N-methylnorleucine Nmnle D-glutamine DglnL-N-methylnorvaline Nmnva D-glutamic acid Dglu L-N-methylornithine NmornD-histidine Dhis L-N-methylphenylalanine Nmphe D-isoleucine DileL-N-methylproline Nmpro D-leucine Dleu L-N-methylserine Nmser D-lysineDlys L-N-methylthreonine Nmthr D-methionine Dmet L-N-methyltryptophanNmtrp D-ornithine Dorn L-N-methyltyrosine Nmtyr D-phenylalanine DpheL-N-methylvaline Nmval D-proline Dpro L-N-methylethylglycine NmetgD-serine Dser L-N-methyl-t-butylglycine Nmtbug D-threonine DthrL-norleucine Nle D-tryptophan Dtrp L-norvaline Nva D-tyrosine Dtyrα-methyl-aminoisobutyrate Maib D-valine Dval α-methyl-γ-aminobutyrateMgabu D-α-methylalanine Dmala α-methylcyclohexylalanine MchexaD-α-methylarginine Dmarg α-methylcyclopentylalanine McpenD-α-methylasparagine Dmasn α-methyl-α-napthylalanine ManapD-α-methylaspartate Dmasp α-methylpenicillamine Mpen D-α-methylcysteineDmcys N-(4-aminobutyl)glycine Nglu D-α-methylglutamine DmglnN-(2-aminoethyl)glycine Naeg D-α-methylhistidine DmhisN-(3-aminopropyl)glycine Norn D-α-methylisoleucine DmileN-amino-α-methylbutyrate Nmaabu D-α-methylleucine Dmleu α-napthylalanineAnap D-α-methyllysine Dmlys N-benzylglycine Nphe D-α-methylmethionineDmmet N-(2-carbamylethyl)glycine Ngln D-α-methylornithine DmornN-(carbamylmethyl)glycine Nasn D-α-methylphenylalanine DmpheN-(2-carboxyethyl)glycine Nglu D-α-methylproline DmproN-(carboxymethyl)glycine Nasp D-α-methylserine Dmser N-cyclobutylglycineNcbut D-α-methylthreonine Dmthr N-cycloheptylglycine NchepD-α-methyltryptophan Dmtrp N-cyclohexylglycine Nchex D-α-methyltyrosineDmty N-cyclodecylglycine Ncdec D-α-methylvaline DmvalN-cyclododeclglycine Ncdod D-α-methylalnine Dnmala N-cyclooctylglycineNcoct D-α-methylarginine Dnmarg N-cyclopropylglycine NcproD-α-methylasparagine Dnmasn N-cycloundecylglycine NcundD-α-methylasparatate Dnmasp N-(2,2-diphenylethyl)glycine NbhmD-α-methylcysteine Dnmcys N-(3,3-diphenylpropyl)glycine NbheD-N-methylleucine Dnmleu N-(3-indolylyethyl) glycine NhtrpD-N-methyllysine Dnmlys N-methyl-γ-aminobutyrate NmgabuN-methylcyclohexylalanine Nmchexa D-N-methylmethionine DnmmetD-N-methylornithine Dnmorn N-methylcyclopentylalanine NmcpenN-methylglycine Nala D-N-methylphenylalanine DnmpheN-methylaminoisobutyrate Nmaib D-N-methylproline DnmproN-(1-methylpropyl)glycine Nile D-N-methylserine DnmserN-(2-methylpropyl)glycine Nile D-N-methylserine DnmserN-(2-methylpropyl)glycine Nleu D-N-methylthreonine DnmthrD-N-methyltryptophan Dnmtrp N-(1-methylethyl)glycine NvaD-N-methyltyrosine Dnmtyr N-methyla-napthylalanine NmanapD-N-methylvaline Dnmval N-methylpenicillamine Nmpen γ-aminobutyric acidGabu N-(p-hydroxyphenyl)glycine Nhtyr L-t-butylglycine TbugN-(thiomethyl)glycine Ncys L-ethylglycine Etg penicillamine PenL-homophenylalanine Hphe L-α-methylalanine Mala L-α-methylarginine MargL-α-methylasparagine Masn L-α-methylaspartate MaspL-α-methyl-t-butylglycine Mtbug L-α-methylcysteine McysL-methylethylglycine Metg L-α-methylglutamine Mgln L-α-methylglutamateMglu L-α-methylhistidine Mhis L-α-methylhomo phenylalanine MhpheL-α-methylisoleucine Mile N-(2-methylthioethyl)glycine NmetD-N-methylglutamine Dnmgln N-(3-guanidinopropyl)glycine NargD-N-methylglutamate Dnmglu N-(1-hydroxyethyl)glycine NthrD-N-methylhistidine Dnmhis N-(hydroxyethyl)glycine NserD-N-methylisoleucine Dnmile N-(imidazolylethyl)glycine NhisD-N-methylleucine Dnmleu N-(3-indolylyethyl)glycine NhtrpD-N-methyllysine Dnmlys N-methyl-γ-aminobutyrate NmgabuN-methylcyclohexylalanine Nmchexa D-N-methylmethionine DnmmetD-N-methylornithine Dnmorn N-methylcyclopentylalanine NmcpenN-methylglycine Nala D-N-methylphenylalanine DnmpheN-methylaminoisobutyrate Nmaib D-N-methylproline DnmproN-(1-methylpropyl)glycine Nile D-N-methylserine DnmserN-(2-methylpropyl)glycine Nleu D-N-methylthreonine DnmthrD-N-methyltryptophan Dnmtrp N-(1-methylethyl)glycine NvalD-N-methyltyrosine Dnmtyr N-methyla-napthylalanine NmanapD-N-methylvaline Dnmval N-methylpenicillamine Nmpen γ-aminobutyric acidGabu N-(p-hydroxyphenyl)glycine Nhtyr L-t-butylglycine TbugN-(thiomethyl)glycine Ncys L-ethylglycine Etg penicillamine PenL-homophenylalanine Hphe L-α-methylalanine Mala L-α-methylarginine MargL-α-methylasparagine Masn L-α-methylaspartate MaspL-α-methyl-t-butylglycine Mtbug L-α-methylcysteine McysL-methylethylglycine Metg L-α-methylglutamine Mgln L-α-methylglutamateMglu L-α-methylhistidine Mhis L-α-methylhomophenylalanine MhpheL-α-methylisoleucine Mile N-(2-methylthioethyl)glycine NmetL-α-methylleucine Mleu L-α-methyllysine Mlys L-α-methylmethionine MmetL-α-methylnorleucine Mnle L-α-methylnorvaline Mnva L-α-methylornithineMorn L-α-methylphenylalanine Mphe L-α-methylproline MproL-α-methylserine mser L-α-methylthreonine Mthr L-α-methylvaline MtrpL-α-methyltyrosine Mtyr L-α-methylleucine Mval NnbhmL-N-methylhomophenylalanine Nmhphe N-(N-(2,2-diphenylethyl)N-(N-(3,3-diphenylpropyl) carbamylmethyl-glycine Nnbhmcarbamylmethyl(1)glycine Nnbhe 1-carboxy-1-(2,2-diphenyl Nmbcethylamino)cyclopropane

Since the peptides of the present invention are preferably utilized indiagnostics which require the peptides to be in soluble form, thepeptides of the present invention preferably include one or morenon-natural or natural polar amino acids, including but not limited toserine and threonine which are capable of increasing peptide solubilitydue to their hydroxyl-containing side chain.

The peptides of the present invention are preferably utilized in alinear form, although it will be appreciated that in cases wherecyclicization does not severely interfere with peptide characteristics,cyclic forms of the peptide can also be utilized.

The peptides of present invention can be biochemically synthesized suchas by using standard solid phase techniques. These methods includeexclusive solid phase synthesis well known in the art, partial solidphase synthesis methods, fragment condensation, classical solutionsynthesis. These methods are preferably used when the peptide isrelatively short (i.e., 10 kDa) and/or when it cannot be produced byrecombinant techniques (i.e., not encoded by a nucleic acid sequence)and therefore involves different chemistry.

Synthetic peptides can be purified by preparative high performanceliquid chromatography and the composition of which can be confirmed viaamino acid sequencing.

In cases where large amounts of the peptides of the present inventionare desired, the peptides of the present invention can be generatedusing recombinant techniques such as described by Bitter et al., (1987)Methods in Enzymol. 153:516-544, Studier et al. (1990) Methods inEnzymol. 185:60-89, Brisson et al. (1984) Nature 310:511-514, Takamatsuet al. (1987) EMBO J. 6:307-311, Coruzzi et al. (1984) EMBO J.3:1671-1680 and Brogli et al., (1984) Science 224:838-843, Gurley et al.(1986) Mol. Cell. Biol. 6:559-565 and Weissbach & Weissbach, 1988,Methods for Plant Molecular Biology, Academic Press, NY, Section VIII,pp 421-463 and also as described above.

Antibodies

“Antibody” refers to a polypeptide ligand that is preferablysubstantially encoded by an immunoglobulin gene or immunoglobulin genes,or fragments thereof, which specifically binds and recognizes an epitope(e.g., an antigen). The recognized immunoglobulin genes include thekappa and lambda light chain constant region genes, the alpha, gamma,delta, epsilon and mu heavy chain constant region genes, and themyriad-immunoglobulin variable region genes. Antibodies exist, e.g., asintact immunoglobulins or as a number of well characterized fragmentsproduced by digestion with various peptidases. This includes, e.g., Fab′and F(ab)′₂ fragments. The term “antibody,” as used herein, alsoincludes antibody fragments either produced by the modification of wholeantibodies or those synthesized de novo using recombinant DNAmethodologies. It also includes polyclonal antibodies, monoclonalantibodies, chimeric antibodies, humanized antibodies, or single chainantibodies. “Fc” portion of an antibody refers to that portion of animmunoglobulin heavy chain that comprises one or more heavy chainconstant region domains, CH1, CH2 and CH3, but does not include theheavy chain variable region.

The functional fragments of antibodies, such as Fab, F(ab′)2, and Fvthat are capable of binding to macrophages, are described as follows:(1) Fab, the fragment which contains a monovalent antigen-bindingfragment of an antibody molecule, can be produced by digestion of wholeantibody with the enzyme papain to yield an intact light chain and aportion of one heavy chain; (2) Fab′, the fragment of an antibodymolecule that can be obtained by treating whole antibody with pepsin,followed by reduction, to yield an intact light chain and a portion ofthe heavy chain; two Fab′ fragments are obtained per antibody molecule;(3) (Fab′)2, the fragment of the antibody that can be obtained bytreating whole antibody with the enzyme pepsin without subsequentreduction; F(ab′)2 is a dimer of two Fab′ fragments held together by twodisulfide bonds; (4) Fv, defined as a genetically engineered fragmentcontaining the variable region of the light chain and the variableregion of the heavy chain expressed as two chains; and (5) Single chainantibody (“SCA”), a genetically engineered molecule containing thevariable region of the light chain and the variable region of the heavychain, linked by a suitable polypeptide linker as a genetically fusedsingle chain molecule.

Methods of producing polyclonal and monoclonal antibodies as well asfragments thereof are well known in the art (See for example, Harlow andLane, Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory,New York, 1988, incorporated herein by reference).

Antibody fragments according to the present invention can be prepared byproteolytic hydrolysis of the antibody or by expression in E. coli ormammalian cells (e.g. Chinese hamster ovary cell culture or otherprotein expression systems) of DNA encoding the fragment. Antibodyfragments can be obtained by pepsin or papain digestion of wholeantibodies by conventional methods. For example, antibody fragments canbe produced by enzymatic cleavage of antibodies with pepsin to provide a5S fragment denoted F(ab′)2. This fragment can be further cleaved usinga thiol reducing agent, and optionally a blocking group for thesulfhydryl groups resulting from cleavage of disulfide linkages, toproduce 3.5S Fab′ monovalent fragment's. Alternatively, an enzymaticcleavage using pepsin produces two monovalent Fab′ fragments and an Fcfragment directly. These methods are described, for example, byGoldenberg, U.S. Pat. Nos. 4,036,945 and 4,331,647, and referencescontained therein, which patents are hereby incorporated by reference intheir entirety. See also Porter, R. R. [Biochem. J. 73: 119-126 (1959)].Other methods of cleaving antibodies, such as separation of heavy chainsto form monovalent light-heavy chain fragments, further cleavage offragments, or other enzymatic, chemical, or genetic techniques may alsobe used, so long as the fragments bind to the antigen that is recognizedby the intact antibody.

Fv fragments comprise an association of VH and VL chains. Thisassociation may be noncovalent, as described in Inbar et al. [Proc.Nat'l Acad. Sci. USA 69:2659-62 (19720]. Alternatively, the variablechains can be linked by an intermolecular disulfide bond or cross-linkedby chemicals such as glutaraldehyde. Preferably, the Fv fragmentscomprise VH and VL chains connected by a peptide linker. Thesesingle-chain antigen binding proteins (sFv) are prepared by constructinga structural gene comprising DNA sequences encoding the VH and VLdomains connected by an oligonucleotide. The structural gene is insertedinto an expression vector, which is subsequently introduced into a hostcell such as E. coli. The recombinant host cells synthesize a singlepolypeptide chain with a linker peptide bridging the two V domains.Methods for producing sFvs are described, for example, by [Whitlow andFilpula, Methods 2: 97-105 (1991); Bird et al., Science 242:423-426(1988); Pack et al., Bio/Technology 11:1271-77 (1993); and U.S. Pat. No.4,946,778, which is hereby incorporated by reference in its entirety.

Another form of an antibody fragment is a peptide coding for a singlecomplementarity-determining region (CDR). CDR peptides (“minimalrecognition units”) can be obtained by constructing genes encoding theCDR of an antibody of interest. Such genes are prepared, for example, byusing the polymerase chain reaction to synthesize the variable regionfrom RNA of antibody-producing cells. See, for example, Larrick and Fry[Methods, 2: 106-10 (1991)].

Humanized forms of non-human (e.g., murine) antibodies are chimericmolecules of immunoglobulins, immunoglobulin chains or fragments thereof(such as Fv, Fab, Fab′, F(ab′) or other antigen-binding subsequences ofantibodies) which contain minimal sequence derived from non-humanimmunoglobulin. Humanized antibodies include human immunoglobulins(recipient antibody) in which residues from a complementary determiningregion (CDR) of the recipient are replaced by residues from a CDR of anon-human species (donor antibody) such as mouse, rat or rabbit havingthe desired specificity, affinity and capacity. In some instances, Fvframework residues of the human immunoglobulin are replaced bycorresponding non-human residues. Humanized antibodies may also compriseresidues which are found neither in the recipient antibody nor in theimported CDR or framework sequences. In general, the humanized antibodywill comprise substantially all of at least one, and typically two,variable domains, in which all or substantially all of the CDR regionscorrespond to those of a non-human immunoglobulin and all orsubstantially all of the FR regions are those of a human immunoglobulinconsensus sequence. The humanized antibody optimally also will compriseat least a portion of an immunoglobulin constant region (Fc), typicallythat of a human immunoglobulin [Jones et al., Nature, 321:522-525(1986); Riechmann et al., Nature, 332:323-329 (1988); and Presta, Curr.Op. Struct. Biol., 2:593-596 (1992)].

Methods for humanizing non-human antibodies are well known in the art.Generally, a humanized antibody has one or more amino acid residuesintroduced into it from a source which is non-human. These non-humanamino acid residues are often referred to as import residues, which aretypically taken from an import variable domain. Humanization can beessentially performed following the method of Winter and co-workers[Jones et al., Nature, 321:522-525 (1986); Riechmann et al., Nature332:323-327 (1988); Verhoeyen et al., Science, 239:1534-1536 (1988)], bysubstituting rodent CDRs or CDR sequences for the correspondingsequences of a human antibody. Accordingly, such humanized antibodiesare chimeric antibodies (U.S. Pat. No. 4,816,567), wherein substantiallyless than an intact human variable domain has been substituted by thecorresponding sequence from a non-human species. In practice, humanizedantibodies are typically human antibodies in which some CDR residues andpossibly some FR residues are substituted by residues from analogoussites in rodent antibodies.

Human antibodies can also be produced using various techniques known inthe art, including phage display libraries [Hoogenboom and Winter, J.Mol. Biol., 227:381 (1991); Marks et al., J. Mol. Biol., 222:581(1991)]. The techniques of Cole et al. and Boerner et al. are alsoavailable for the preparation of human monoclonal antibodies (Cole etal., Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, p. 77(1985) and Boerner et al., J. Immunol., 147(1):86-95 (1991)]. Similarly,human antibodies can be made by introduction of human immunoglobulinloci into transgenic animals, e.g., mice in which the endogenousimmunoglobulin genes have been partially or completely inactivated. Uponchallenge, human antibody production is observed, which closelyresembles that seen in humans in all respects, including generearrangement, assembly, and antibody repertoire. This approach isdescribed, for example, in U.S. Pat. Nos. 5,545,807; 5,545,806;5,569,825; 5,625,126; 5,633,425; 5,661,016, and in the followingscientific publications: Marks et al., Bio/Technology 10,: 779-783(1992); Lonberg et al., Nature 368: 856-859 (1994); Morrison, Nature 368812-13 (1994); Fishwild et al., Nature Biotechnology 14, 845-51 (1996);Neuberger, Nature Biotechnology 14: 826 (1996); and Lonberg and Huszar,Intern. Rev. Immunol. 13, 65-93 (1995).

Preferably, the antibody of this aspect of the present inventionspecifically binds at least one epitope of the polypeptide variants ofthe present invention. As used herein, the term “epitope” refers to anyantigenic determinant on an antigen to which the paratope of an antibodybinds.

Epitopic determinants usually consist of chemically active surfacegroupings of molecules such as amino acids or carbohydrate side chainsand usually have specific three dimensional structural characteristics,as well as specific charge characteristics.

Optionally, a unique epitope may be created in a variant due to a changein one or more post-translational modifications, including but notlimited to glycosylation and/or phosphorylation, as described below.Such a change may also cause a new epitope to be created, for examplethrough removal of glycosylation at a particular site.

An epitope according to the present invention may also optionallycomprise part or all of a unique sequence portion of a variant accordingto the present invention in combination with at least one other portionof the variant which is not contiguous to the unique sequence portion inthe linear polypeptide itself, yet which are able to form an epitope incombination. One or more unique sequence portions may optionally combinewith one or more other non-contiguous portions of the variant (includinga portion which may have high homology to a portion of the knownprotein) to form an epitope.

Immunoassays

In another embodiment of the present invention, an immunoassay can beused to qualitatively or quantitatively detect and analyze markers in asample. This method comprises: providing an antibody that specificallybinds to a marker; contacting a sample with the antibody; and detectingthe presence of a complex of the antibody bound to the marker in thesample.

To prepare an antibody that specifically binds to a marker, purifiedprotein markers can be used. Antibodies that specifically bind to aprotein marker can be prepared using any suitable methods known in theart.

After the antibody is provided, a marker can be detected and/orquantified using any of a number of well recognized immunologicalbinding assays. Useful assays include, for example, an enzyme immuneassay (EIA) such as enzyme-linked immunosorbent assay (ELISA), aradioimmune assay (RIA), a Western blot assay, or a slot blot assay see,e.g., U.S. Pat. Nos. 4,366,241; 4,376,110; 4,517,288; and 4,837,168).Generally, a sample obtained from a subject can be contacted with theantibody that specifically binds the marker.

Optionally, the antibody can be fixed to a solid support to facilitatewashing and subsequent isolation of the complex, prior to contacting theantibody with a sample. Examples of solid supports include but are notlimited to glass or plastic in the form of, e.g., a microtiter plate, astick, a bead, or a microbead. Antibodies can also be attached to asolid support.

After incubating the sample with antibodies, the mixture is washed andthe antibody-marker complex formed can be detected. This can beaccomplished by incubating the washed mixture with a detection reagent.Alternatively, the marker in the sample can be detected using anindirect assay, wherein, for example, a second, labeled antibody is usedto detect bound marker-specific antibody, and/or in a competition orinhibition assay wherein, for example, a monoclonal antibody which bindsto a distinct epitope of the marker are incubated simultaneously withthe mixture.

Throughout the assays, incubation and/or washing steps may be requiredafter each combination of reagents. Incubation steps can vary from about5 seconds to several hours, preferably from about 5 minutes to about 24hours. However, the incubation time will depend upon the assay format,marker, volume of solution, concentrations and the like. Usually theassays will be carried out at ambient temperature, although they can beconducted over a range of temperatures, such as 10° C. to 40° C.

The immunoassay can be used to determine a test amount of a marker in asample from a subject. First, a test amount of a marker in a sample canbe detected using the immunoassay methods described above. If a markeris present in the sample, it will form an antibody-marker complex withan antibody that specifically binds the marker under suitable incubationconditions described above. The amount of an antibody-marker complex canoptionally be determined by comparing to a standard. As noted above, thetest amount of marker need not be measured in absolute units, as long asthe unit of measurement can be compared to a control amount and/orsignal.

Preferably used are antibodies which specifically interact with thepolypeptides of the present invention and not with wild type proteins orother isoforms thereof, for example. Such antibodies are directed, forexample, to the unique sequence portions of the polypeptide variants ofthe present invention, including but not limited to bridges, heads,tails and insertions described in greater detail below. Preferredembodiments of antibodies according to the present invention aredescribed in greater detail with regard to the section entitled“Antibodies”.

Radio-immunoassay (RIA): In one version, this method involvesprecipitation of the desired substrate and in the methods detailedhereinbelow, with a specific antibody and radiolabelled antibody bindingprotein (e.g., protein A labeled with I¹²⁵) immobilized on aprecipitable carrier such as agarose beads. The number of counts in theprecipitated pellet is proportional to the amount of substrate.

In an alternate version of the RIA, a labeled substrate and anunlabelled antibody binding protein are employed. A sample containing anunknown amount of substrate is added in varying amounts. The decrease inprecipitated counts from the labeled substrate is proportional to theamount of substrate in the added sample.

Enzyme linked immunosorbent assay (ELISA): This method involves fixationof a sample (e.g., fixed cells or a proteinaceous solution) containing aprotein substrate to a surface such as a well of a microtiter plate. Asubstrate specific antibody coupled to an enzyme is applied and allowedto bind to the substrate. Presence of the antibody is then detected andquantitated by a colorimetric reaction employing the enzyme coupled tothe antibody. Enzymes commonly employed in this method includehorseradish peroxidase and alkaline phosphatase. If well calibrated andwithin the linear range of response, the amount of substrate present inthe sample is proportional to the amount of color produced. A substratestandard is generally employed to improve quantitative accuracy.

Western blot: This method involves separation of a substrate from otherprotein by means of an acrylamide gel followed by transfer of thesubstrate to a membrane (e.g., nylon or PVDF). Presence of the substrateis then detected by antibodies specific to the substrate, which are inturn detected by antibody binding reagents. Antibody binding reagentsmay be, for example, protein A, or other antibodies. Antibody bindingreagents may be radiolabelled or enzyme linked as described hereinabove.Detection may be by autoradiography, colorimetric reaction orchemiluminescence. This method allows both quantitation of an amount ofsubstrate and determination of its identity by a relative position onthe membrane which is indicative of a migration distance in theacrylamide gel during electrophoresis.

Immunohistochemical analysis: This method involves detection of asubstrate in situ in fixed cells by substrate specific antibodies. Thesubstrate specific antibodies may be enzyme linked or linked tofluorophores. Detection is by microscopy and subjective evaluation. Ifenzyme linked antibodies are employed, a colorimetric reaction may berequired.

Fluorescence activated cell sorting (FACS): This method involvesdetection of a substrate in situ in cells by substrate specificantibodies. The substrate specific antibodies are linked tofluorophores. Detection is by means of a cell sorting machine whichreads the wavelength of light emitted from each cell as it passesthrough a light beam. This method may employ two or more antibodiessimultaneously.

Radio-Imaging Methods

These methods include but are not limited to, positron emissiontomography (PET) single photon emission computed tomography (SPECT).Both of these techniques are non-invasive, and can be used to detectand/or measure a wide variety of tissue events and/or functions, such asdetecting cancerous cells for example. Unlike PET, SPECT can optionallybe used with two labels simultaneously. SPECT has some other advantagesas well, for example with regard to cost and the types of labels thatcan be used. For example, U.S. Pat. No. 6,696,686 describes the use ofSPECT for detection of breast cancer, and is hereby incorporated byreference as if fully set forth herein.

Display Libraries

According to still another aspect of the present invention there isprovided a display library comprising a plurality of display vehicles(such as phages, viruses or bacteria) each displaying at least 6, atleast 7, at least 8, at least 9, at least 10, 10-15, 12-17, 15-20, 15-30or 20-50 consecutive amino acids derived from the polypeptide sequencesof the present invention.

Methods of constructing such display libraries are well known in theart. Such methods are described in, for example, Young A C, et al., “Thethree-dimensional structures of a polysaccharide binding antibody toCryptococcus neoformans and its complex with a peptide from a phagedisplay library: implications for the identification of peptidemimotopes” J Mol Biol 1997 Dec. 12; 274(4):622-34; Giebel L B et al.“Screening of cyclic peptide phage libraries identifies ligands thatbind streptavidin with high affinities” Biochemistry 1995 Nov. 28;34(47):15430-5; Davies E L et al., “Selection of specific phage-displayantibodies using libraries derived from chicken immunoglobulin genes” JImmunol Methods 1995 Oct. 12; 186(1):125-35; Jones C R T al. “Currenttrends in molecular recognition and bioseparation” J Chromatogr A 1995Jul. 14; 707(1):3-22; Deng S J et al. “Basis for selection of improvedcarbohydrate-binding single-chain antibodies from synthetic genelibraries” Proc Natl Acad Sci USA 1995 May 23; 92(11):4992-6; and Deng SJ et al. “Selection of antibody single-chain variable fragments withimproved carbohydrate binding by phage display” J Biol Chem 1994 Apr. 1;269(13):9533-8, which are incorporated herein by reference.

The following sections relate to Candidate Marker Examples (firstsection) and to Experimental Data for these Marker Examples (secondsection).

Candidate Marker Examples Section

This Section relates to Examples of sequences according to the presentinvention, including illustrative methods of selection thereof.

Description of the methodology undertaken to uncover the biomolecularsequences of the present invention

Human ESTs and cDNAs were obtained from GenBank versions 136 (Jun. 15,2003 ftp dot ncbi dot nih dot gov/genbank/release dot notes/gb136 dotrelease dot notes); NCBI genome assembly of April 2003; RefSeq sequencesfrom June 2003; Genbank version 139 (December 2003); Human Genome fromNCBI (Build 34) (from October 2003); and RefSeq sequences from December2003; and from the LifeSeq library of Incyte Corporation (ESTs only;Wilmington, Del., USA). With regard to GenBank sequences, the human ESTsequences from the EST (GBEST) section and the human mRNA sequences fromthe primate (GBPRI) section were used; also the human nucleotide RefSeqmRNA sequences were used (see for example dot ncbi dot nlm dot nih dotgov/Genbank/GenbankOverview dot html and for a reference to the ESTsection, see dot ncbi dot nlm dot nih dot gov/dbEST/; a generalreference to dbEST, the EST database in GenBank, may be found in Boguskiet al, Nat Genet. 1993 August; 4(4):332-3; all of which are herebyincorporated by reference as if fully set forth herein).

Novel splice variants were predicted using the LEADS clustering andassembly system as described in Sorek, R., Ast, G. & Graur, D.Alu-containing exons are alternatively spliced. Genome Res 12, 1060-7(2002); U.S. Pat. No. 6,625,545; and U.S. patent application No.10/426,002, published as US20040101876 on May 27, 2004; all of which arehereby incorporated by reference as if fully set forth herein. Briefly,the software cleans the expressed sequences from repeats, vectors andimmunoglobulins. It then aligns the expressed sequences to the genometaking alternatively splicing into account and clusters overlappingexpressed sequences into “clusters” that represent genes or partialgenes.

These were annotated using the GeneCarta (Compugen, Tel-Aviv, Israel)platform. The GeneCarta platform includes a rich pool of annotations,sequence information (particularly of spliced sequences), chromosomalinformation, alignments, and additional information such as SNPs, geneontology terms, expression profiles, functional analyses, detaileddomain structures, known and predicted proteins and detailed homologyreports.

A brief explanation is provided with regard to the method of selectingthe candidates. However, it should noted that this explanation isprovided for descriptive purposes only, and is not intended to belimiting in any way. The potential markers were identified by acomputational process that was designed to find genes and/or theirsplice variants that are over-expressed in tumor tissues, by usingdatabases of expressed sequences. Various parameters related to theinformation in the EST libraries, determined according to a manualclassification process, were used to assist in locating genes and/orsplice variants thereof that are over-expressed in cancerous tissues.The detailed description of the selection method is presented in Example1 below. The cancer biomarkers selection engine and the following wetvalidation stages are schematically summarized in FIG. 1.

Example 1 Identification of Differentially Expressed GeneProducts—Algorithm

In order to distinguish between differentially expressed gene productsand constitutively expressed genes (i.e., house keeping genes) analgorithm based on an analysis of frequencies was configured. A specificalgorithm for identification of transcripts over expressed in cancer isdescribed hereinbelow.

Dry analysis

Library annotation—EST libraries are manually classified according to:

-   -   Tissue origin    -   Biological source—Examples of frequently used biological sources        for construction of EST libraries include cancer cell-lines;        normal tissues; cancer tissues; fetal tissues; and others such        as normal cell lines and pools of normal cell-lines, cancer        cell-lines and combinations thereof. A specific description of        abbreviations used below with regard to these tissues/cell lines        etc is given above.    -   Protocol of library construction—various methods are known in        the art for library construction including normalized library        construction; non-normalized library construction; subtracted        libraries; ORESTES and others. It will be appreciated that at        times the protocol of library construction is not indicated.

The following rules are followed:

EST libraries originating from identical biological samples areconsidered as a single library.

EST libraries which included above-average levels of contamination, suchas DNA contamination for example, were eliminated. The presence of suchcontamination was determined as follows. For each library, the number ofunspliced ESTs that are not fully contained within other splicedsequences was counted. If the percentage of such sequences (as comparedto all other sequences) was at least 4 standard deviations above theaverage for all libraries being analyzed, this library was tagged asbeing contaminated and was eliminated from further consideration in thebelow analysis (see also Sorek, R. & Safer, H. M. A novel algorithm forcomputational identification of contaminated EST libraries. NucleicAcids Res 31, 1067-74 (2003) for further details).

Clusters (genes) having at least five sequences including at least twosequences from the tissue of interest were analyzed. Splice variantswere identified by using the LEADS software package as described above.

Example 2 Identification of Genes Over Expressed in Cancer.

Two different scoring algorithms were developed.

Libraries score—candidate sequences which are supported by a number ofcancer libraries, are more likely to serve as specific and effectivediagnostic markers.

The basic algorithm—for each cluster the number of cancer and normallibraries contributing sequences to the cluster was counted. Fisherexact test was used to check if cancer libraries are significantlyover-represented in the cluster as compared to the total number ofcancer and normal libraries.

Library counting: Small libraries (e.g., less than 1000 sequences) wereexcluded from consideration unless they participate in the cluster. Forthis reason, the total number of libraries is actually adjusted for eachcluster.

Clones no. score—Generally, when the number of ESTs is much higher inthe cancer libraries relative to the normal libraries it might indicateactual over-expression.

The algorithm—

Clone counting: For counting EST clones each library protocol class wasgiven a weight based on our belief of how much the protocol reflectsactual expression levels:

(i) non-normalized: 1

(ii) normalized: 0.2

(iii) all other classes: 0.1

Clones number score—The total weighted number of EST clones from cancerlibraries was compared to the EST clones from normal libraries. To avoidcases where one library contributes to the majority of the score, thecontribution of the library that gives most clones for a given clusterwas limited to 2 clones.

The score was computed as

$\frac{c + 1}{C}/\frac{n + 1}{N}$

where:

c—weighted number of “cancer” clones in the cluster.

C—weighted number of clones in all “cancer” libraries.

n—weighted number of “normal” clones in the cluster.

N—weighted number of clones in all “normal” libraries.

Clones number score significance—Fisher exact test was used to check ifEST clones from cancer libraries are significantly over-represented inthe cluster as compared to the total number of EST clones from cancerand normal libraries.

Two search approaches were used to find either general cancer-specificcandidates or tumor specific candidates.

-   -   Libraries/sequences originating from tumor tissues are counted        as well as libraries originating from cancer cell-lines        (“normal” cell-lines were ignored).    -   Only libraries/sequences originating from tumor tissues are        counted

Example 3 Identification of Tissue Specific Genes

For detection of tissue specific clusters, tissue libraries/sequenceswere compared to the total number of libraries/sequences in cluster.Similar statistical tools to those described in above were employed toidentify tissue specific genes. Tissue abbreviations are the same as forcancerous tissues, but are indicated with the header “normal tissue”.

The algorithm—for each tested tissue T and for each tested cluster thefollowing were examined:

1. Each cluster includes at least 2 libraries from the tissue T. Atleast 3 clones (weighed—as described above) from tissue T in thecluster; and

2. Clones from the tissue T are at least 40% from all the clonesparticipating in the tested cluster

Fisher exact test P-values were computed both for library and weightedclone counts to check that the counts are statistically significant.

Example 4

Identification of Splice Variants Over Expressed in Cancer of Clusterswhich are Not Over Expressed in Cancer

Cancer-specific splice variants containing a unique region wereidentified.

Identification of Unique Sequence Regions in Splice Variants

A Region is defined as a group of adjacent exons that always appear ordo not appear together in each splice variant.

A “segment” (sometimes referred also as “seg” or “node”) is defined asthe shortest contiguous transcribed region without known splicinginside.

Only reliable ESTs were considered for region and segment analysis. AnEST was defined as unreliable if:

(i) Unspliced;

(ii) Not covered by RNA;

(iii) Not covered by spliced ESTs; and

(iv) Alignment to the genome ends in proximity of long poly-A stretch orstarts in proximity of long poly-T stretch.

Only reliable regions were selected for further scoring. Unique sequenceregions were considered reliable if:

(i) Aligned to the genome; and

(ii) Regions supported by more than 2 ESTs.

The Algorithm

Each unique sequence region divides the set of transcripts into 2groups:

(i) Transcripts containing this region (group TA).

(ii) Transcripts not containing this region (group TB).

The set of EST clones of every cluster is divided into 3 groups:

(i) Supporting (originating from) transcripts of group TA (S1).

(ii) Supporting transcripts of group TB (S2).

(iii) Supporting transcripts from both groups (S3).

Library and clones number scores described above were given to S1 group.

Fisher Exact Test P-values were used to check if:

S1 is significantly enriched by cancer EST clones compared to S2; and

S1 is significantly enriched by cancer EST clones compared to clusterbackground (S1+S2+S3).

Identification of unique sequence regions and division of the group oftranscripts accordingly is illustrated in FIG. 2. Each of these uniquesequence regions corresponds to a segment, also termed herein a “node”.

Region 1: common to all transcripts, thus it is not considered fordetecting variants; Region 2: specific to Transcript 1; Region 3:specific to Transcripts 2 and 3; Region 4: specific to Transcript 3;Region 5: specific to Transcript 1 and 2; Region 6: specific toTranscript 1.

Example 5

Identification of Cancer Specific Splice Variants of Genes OverExpressed in Cancer

A search for EST supported (no mRNA) regions for genes of:

(i) known cancer markers

(ii) Genes shown to be over-expressed in cancer in published micro-arrayexperiments.

Reliable EST supported-regions were defined as supported by minimum ofone of the following:

(i) 3 spliced ESTs; or

(ii) 2 spliced ESTs from 2 libraries;

(iii) 10 unspliced ESTs from 2 libraries, or

(iv) 3 libraries.

Actual Marker Examples

The following examples relate to specific actual marker examples.

Experimental Examples Section

This Section relates to Examples describing experiments involving thesesequences, and illustrative, non-limiting examples of methods, assaysand uses thereof. The materials and experimental procedures areexplained first, as all experiments used them as a basis for the workthat was performed.

The markers of the present invention were tested with regard to theirexpression in various cancerous and non-cancerous tissue samples. Adescription of the samples used in the lung cancer panel is provided inTables 2 and 2_(—)1, below. A description of the samples used in thenormal tissue panel is provided in Tables 3 and 3_(—)1, below. The keyfor Table 2_(—)1 is provided in Table 2_(—)1_(—)1 below. Tests were thenperformed as described in the “Materials and Experimental Procedures”section below.

TABLE 2 Tissue samples in testing panel sample rename Lot No. sourcepathology Grade gender/age  1-B-Adeno G1 A504117 Biochain Adenocarcinoma1 F/29  2-B-Adeno G1 A504118 Biochain Adenocarcinoma 1 M/64 95-B-AdenoG1 A610063 Biochain Adenocarcinoma 1 F/54 12-B-Adeno G2 A504119 BiochainAdenocarcinoma 2 F/74 75-B-Adeno G2 A609217 Biochain Adenocarcinoma 2M/65 77-B-Adeno G2 A608301 Biochain Adenocarcinoma 2 M/44 13-B-AdenoG2-3 A504116 Biochain Adenocarcinoma 2-3 M/64 89-B-Adeno G2-3 A609077Biochain Adenocarcinoma 2-3 M/62 76-B-Adeno G3 A609218 BiochainAdenocarcinoma 3 M/57 94-B-Adeno G3 A610118 Biochain Adenocarcinoma 3M/68  3-CG-Adeno CG-200 Ichilov Adenocarcinoma NA 14-CG-Adeno CG-111Ichilov Adenocarcinoma M/68 15-CG-Bronch adeno CG-244 IchilovBronchioloalveolar M/74 adenocarcinoma 45-B-Alvelous Adeno A501221Biochain Alveolus F/50 carcinoma 44-B-Alvelous Adeno G2 A501123 BiochainAlveolus 2 F/61 carcinoma 19-B-Squamous G1 A408175 Biochain Squamous 1M/78 carcinoma 16-B-Squamous G2 A409091 Biochain Squamous 2 F/68carcinoma 17-B-Squamous G2 A503183 Biochain Squamous 2 M/57 carcinoma21-B-Squamous G2 A503187 Biochain Squamous 2 M/52 carcinoma78-B-Squamous G2 A607125 Biochain Squamous Cell 2 M/62 Carcinoma80-B-Squamous G2 A609163 Biochain Squamous Cell 2 M/74 Carcinoma18-B-Squamous G2-3 A503387 Biochain Squamous Cell 2-3 M/63 Carcinoma81-B-Squamous G3 A609076 Biochain Squamous 3 m/53 Carcinoma79-B-Squamous G3 A609018 Biochain Squamous Cell 3 M/67 Carcinoma20-B-Squamous A501121 Biochain Squamous M/64 Carcinoma 22-B-SquamousA503386 Biochain Squamous M/48 Carcinoma 88-B-Squamous A609219 BiochainSquamous Cell M/64 Carcinoma 100-B-Squamous A409017 Biochain SquamousM/64 Carcinoma 23-CG-Squamous CG-109 (1) Ichilov Squamous M/65 Carcinoma24-CG-Squamous CG-123 Ichilov Squamous M/76 Carcinoma 25-CG-SquamousCG-204 Ichilov Squamous M/72 Carcinoma 87-B-Large cell G3 A609165Biochain Large Cell 3 F/47 Carcinoma 38-B-Large cell A504113 BiochainLarge cell M/58 39-B-Large cell A504114 Biochain Large cell F/3582-B-Large cell A609170 Biochain Large Cell M/68 NeuroendocrineCarcinoma 30-B-Small cell carci G3 A501389 Biochain small cell 3 M/3431-B-Small cell carci G3 A501390 Biochain small cell 3 F/59 32-B-Smallcell carci G3 A501391 Biochain small cell 3 M/30 33-B-Small cell carciG3 A504115 Biochain small cell 3 M 86-B-Small cell carci G3 A608032Biochain Small Cell 3 F/52 Carcinoma 83-B-Small cell carci A609162Biochain Small Cell F/47 Carcinoma 84-B-Small cell carci A609167Biochain Small Cell F/59 Carcinoma 85-B-Small cell carci A609169Biochain Small Cell M/66 Carcinoma 46-B-N M44 A501124 Biochain NormalM44 F/61 47-B-N A503205 Biochain Normal PM M/26 48-B-N A503206 BiochainNormal PM M/44 49-B-N A503384 Biochain Normal PM M/27 50-B-N A503385Biochain Normal PM M/28 90-B-N A608152 Biochain Normal (Pool 2) pool 2PM 91-B-N A607257 Biochain Normal (Pool 2) pool 2 PM 92-B-N A503204Biochain Normal PM m/28 93-Am-N 111P0103A Ambion Normal PM F/61 96-Am-N36853 Ambion Normal PM F/43 97-Am-N 36854 Ambion Normal PM M/46 98-Am-N36855 Ambion Normal PM F/72 99-Am-N 36856 Ambion Normal PM M/31

TABLE 2_1 Lung cancer testing panel sample id (GCI)/ case id TISSUE RNA(Asterand)/ ID ID lot (GCI)/ (GCI)/ no. specimen Sample Source/ sample(old ID ID Diag Specimen Tum Tissue Delivery name samples) (Asterand)(Asterand) Diag remarks location Gr TNM CS % Gen LC GCI 1-GC- 7Z9V47Z9V4AYM Aden BC IA 80 F BAC- SIA LC GCI 2-GC- ZW2AQ ZW2AQARP Aden BC IB70 F BAC- SIB LC Bioch 72- A501123 AC 2 UN F (44)- Bc- BAC LC Bioch 73-A501221 AC UN UN F (45)- Bc- BAC LC GCI 4-GC- 3MOPL 3MOPLA79 Aden IA 60M Adeno- SIA LC GCI 5-GC- KOJXD KOJXDAV4 Aden IA 90 F Adeno- SIA LC GCI6-GC- X2Q44 X2Q44A79 Aden IA 85 M Adeno- SIA LC GCI 7-GC- 6BACZ 6BACZAP5Aden IA 60 F Adeno- SIA LC GCI 8-GC- BS9AF BS9AFA3E Aden IA 55 F Adeno-SIA LC GCI 9-GC- UCLOA UCLOAA9L Aden IA 80 F Adeno- SIA LC GCI 10-GC-BVYK3 BVYK3A7Z Aden IA 60 F Adeno- SIA LC GCI 11-GC- U4DM4 U4DM4AFZ AdenIB 65 F Adeno- SIB LC GCI 12-GC- OWX5Y OWX5YA3S Aden IB 90 M Adeno- SIBLC GCI 13-GC- XYY96 XYY96A6B Aden IIA 70 F Adeno- SIIA LC GCI 14-GC-SO7B1 SO7B1AIJ Aden IIA 70 M Adeno- SIIA LC GCI 15-GC- QANSY QANSYACDAden IIIA 65 F Adeno- SIIIA LC Bioch 16- A610063 Aden 1 UN F (95)- BC-Adeno LC Bioch 17- A609077 Aden 2-3 UN M (89)- Bc- Adeno LC Bioch 18-A609218 Aden 3 UN M (76)- Bc- Adeno LC Bioch 74-(2)- A504118 Aden 1 UN MBc- Adeno LC Bioch 76- A609217 Aden 2 UN M (75)- Bc- Adeno LC Bioch 77-A504119 Aden 2 UN F (12)- Bc- Adeno LC Bioch 78- A504116 Aden 2-3 UN M(13)- Bc- Adeno LC Bioch 79- A610118 Aden 3 UN M (94)- Bc- Adeno LCIchilov 80-(3)- CG- Aden UN UN F Ic- 200 Adeno LC Ichilov 81- CG- AdenUN UN M (14)-Ic- 111 Adeno LC Aster 19-As- 9220 9418 9418A1 SQ 1 TXN0M0Occult 80 M Sq-S0 LC GCI 20-GC- U2QHS U2QHSA2N SQ IA 55 F Sq-SIA LC GCI21-GC- TRQR7 TRQR7ACD SQ IB 75 M Sq-SIB LC Aster 22-As- 17581 3260332603B1 SQ 3 T2N0M0 IB 90 M Sq-SIB LC Aster 23-As- 18309 41454 41454B1SQ 2 T2N0MX IB 100 M Sq-SIB LC Aster 24-As- 9217 9415 9415B1 SQ 2 T2N0M0IB 90 M Sq-SIB LC GCI 25-GC- RXQ1P RXQ1PAEA SQ IIB 55 F Sq-SIIB LC GCI26-GC- KB5KH KB5KHA6X SQ IIB 65 M Sq-SIIB LC GCI 27-GC- LAYMB LAYMBALFSQ IIIA 65 F Sq- SIIIA LC Ichilov 28- CG- SQ UN UN M (23)-Ic- 109 (1) SqLC Ichilov 29- CG- SQ UN UN M (25)-Ic- 204 Sq LC Bioch 30- A408175 SQ 1UN M (19)- Bc-Sq LC Bioch 31- A607125 SQ 2 UN M (78)- Bc-Sq LC Bioch 32-A409091 SQ 2 UN F (16)- Bc-Sq LC Bioch 33- A609163 SQ 2 UN M (80)- Bc-SqLC Bioch 34- A503387 SQ 2-3 UN M (18)- Bc-Sq LC Bioch 35- A609076 SQ 3UN M (81)- Bc-Sq LC Bioch 82- A503187 SQ 2 UN M (21)- Bc-Sq LC Bioch 83-A503183 SQ 2 UN M (17)- Bc-Sq LC Bioch 84- A609018 SQ 3 UN M (79)- Bc-SqLC Bioch 85- A503386 SQ UN UN M (22)- Bc-Sq LC Bioch 86- A501121 SQ UNUN M (20)- Bc-Sq LC Bioch 87- A609219 SQ UN UN M (88)- Bc-Sq LC Bioch88- A409017 SQ UN UN M (100)- Bc-Sq LC Ichilov 89- CG- SQ UN UN M(24)-Ic- 123 Sq LC GCI 36-GC- AF8AL AF8ALAAL LCC IA 85 M LCC- SIA LC GCI37-GC- O62XU O62XUA1X LCC IB 75 F LCC- SIB LC GCI 38-GC- OLOIM OLOIMAS1LCC IB 70 M LCC- SIB LC GCI 39-GC- 1ZWSV 1ZWSVAB9 LCC IIB 50 M LCC- SIIBLC GCI 40-GC- 2YHOD 2YHODA1H LCC NSCC . . . IIB 95 M LCC- SIIB LC GCI41-GC- 38B4D 38B4DAQK LCC IIB 90 F LCC- SIIB LC Bioch 90- A504114 LCC UNUN F (39)- Bc- LCC LC Bioch 91- A609165 LCC 3 UN F (87)- Bc- LCC LCBioch 92- A504113 LCC UN UN M (38)- Bc- LCC LC Bioch 93- A609170 LCNC UNUN M (82)- Bc- LCC LC GCI 42-GC- QPJQL QPJQLAF6 SCC NC 3 IB 65 F SCC-SIB LC Bioch 43- A501391 SCC UN M (32)- Bc- SCC LC Bioch 44- A501389 SCC3 UN M (30)- Bc- SCC LC Bioch 45- A609162 SCC UN UN F (83)- Bc- SCC LCBioch 46- A608032 SCC 3 UN F (86)- Bc- SCC LC Bioch 47- A501390 SCC UN F(31)- Bc- SCC LC Bioch 48- A609167 SCC UN UN F (84)- Bc- SCC LC Bioch49- A609169 SCC UN UN M (85)- Bc- SCC LC Bioch 50- A504115 SCC UN M(33)- Bc- SCC LN Aster 51-As- 9078 9275 9275B1 Norm-L PS M N-PS LN Aster52-As- 8757 8100 8100B1 Norm-L PM (Right), F N-PM Lobe Inferior LN Aster53-As- 6692 6161 6161A1 Norm-L PM M N-PM LN Aster 54-As- 7900 71807180F1 Norm-L PM F N-PM LN Aster 55-As- 8771 8163 8163A1 Norm-L PM(Left), M N-PM Lobe Superior LN Aster 56-As- 13094 19763 19763A1 Norm-LPM M N-PM LN Aster 57-As- 19174 40654 40654A2 Norm-L PM F N-PM LN Aster58-As- 13128 19642 19642A1 Norm-L PM F N-PM LN Aster 59-As- 14374 2054820548C1 Norm-L PM (Right), F N-PM Lobe Superior LN Amb 60- 36856 N- PM M(99)- PM Am-N PM LN Amb 61- 36853 N- PM F (96)- PM Am-N PM LN Amb 62-36854 N- PM M (97)- PM Am-N PM LN Amb 63- 111P0103A N- PM- F (93)- PMICH Am-N PM LN Amb 64- 36855 N- PM F (98)- PM Am-N PM LN Bioch 67-A503385 N- PM M (50)- PM Bc-N PM LN Bioch 68- A503204 N- PM M (92)- PMBc-N PM LN Bioch 69- A607257 N-P2- PM P2 (91)- PM Bc-N PM LN Bioch 70-A608152 N-P2 PM P2 (90)- PM Bc-N PM LN Bioch 71- A503206 N- PM M (48)-PM Bc-N PM # of # Y. Cig. Use # Y. Cause Smoking Per of off Sm Sm Dr #Recovery of Exc. Tissue age Ethnic B Status day Tobacco Tobacco PY? pplAl Dr Type Death Y. LC 63 WCAU Prev 20 15 27 N — Y  0 Surg 2001 U. LC 56WCAU Prev 15 28 10 Y 1 Y  6 Surg 2002 U. LC 61 LC 50 LC 68 WCAU Nev U. —— — N — N — Surg 2001 LC 64 WCAU Prev 15 40  7 Y 1 N  0 Surg 2003 U. LC58 WCAU Prev 10 47  0 Y 2 N — Surg 2004 U. LC 65 WCAU Curr 6 30 — Y 1 N— Surg 2004 U. LC 59 WCAU Curr 20 40 — N — N — Surg 2004 U. LC 69 WCAUCurr 30 52 — Y 4 N — Surg 2005 U. LC 60 WCAU Curr 40 40 — N — N — Surg2002 U. LC 68 WCAU Prev  5  4 43 N — N — Surg 2003 U. LC 69 WCAU Curr 10— — — N — Surg 2002 U. LC 62 WCAU Prev  6 40  6 N — Y 0 Surg 2004 U. LC56 WCAU Curr 30 25 — Y 1 N — Surg 2001 U. LC 61 WCAU Curr 30 36 — Y 1 N— Surg 2004 U. LC 54 LC 62 LC 57 LC 64 LC 65 LC 74 LC 64 LC 68 LC 56 LC68 LC 67 CAU Curr 11-20 31-40 O Surg 2003 U. LC 68 WCAU Prev 10 20  0 N— N — Surg 2004 U. LC 62 WCAU Prev 20 50  0 Y 5 N — Surg 2005 U. LC 73CAU Prev O Surg 2004 U. LC 66 CAU Prev. 11-20 45 P Surg 2005 U. LC 65CAU Curr  6-10 41-50 O Surg 2002 U. LC 44 WCAU Prev 20 20  0 Y 2 N —Surg 2004 U. LC 68 WCAU Prev 40 40  0 Y 2 N — Surg 2004 U. LC 58 WCAUPrev 50 40  1 Y 2 N — Surg 2004 U. LC 65 LC 72 LC 78 LC 62 LC 68 LC 74LC 63 LC 53 LC 52 LC 57 LC 67 LC 48 LC 64 LC 64 LC 64 LC 76 LC 45 WCAUPrev 45 33  0 Y 2 Y 28 Surg 2004 U. LC 60 WCAU Prev 30 45  0 Y 3 N —Surg 2004 U. LC 68 WCAU Prev — 55 — Y — N — Surg 2001 U. LC 51 WCAU Prev20 12 22 Y 1 N — Surg 2004 U. LC 62 WCAU Prev 40 40  0 Y 2 Y 12 Surg2004 U. LC 70 WCAU Prev 30 50 — Y 2 Y 13 Surg 2002 U. LC 35 LC 47 LC 58LC 68 LC 62 WCAU Prev 20 35    0.15 Y 2 N — Surg 2003 U. LC 30 LC 34 LC47 LC 52 LC 59 LC 59 LC 66 LC LN 22 CAU Nev U. NU Surg 2003 LN 26 CAUNev U. O Aut CA 2003 LN 37 CAU Nev U. C Aut MCE 2002 LN 76 CAU Prev AutCPulA 2002 U. LN 81 CAU Prev 41 or 31-40 O Aut CA 2003 U. more LN  0 CAUPrev 21-40 41-50 P Aut IC U. LN 69 CAU Curr 21-40 31-40 P Aut CPulA 2005U. LN 75 CAU Aut CPulA 2004 LN 75 CAU Aut CerA 2004 LN 31 LN 43 LN 46 LN61 LN 72 LN 28 LN 28 LN 24, 29 LN 27, 28 LN 44

TABLE 2_1_1 Key Full Name # Cig. Per day Number of Cigarettes per day #Dr Number of Drinks # of Y. Use of Tobacco Number of Years Using Tobacco# Y. off Tobacco Number of Years Off Tobacco AC Alveolus carcinoma AdenADENOCARCINOMA Amb Ambion Aster Asterand Aut Autopsy BCBRONCHIOLOALVEOLAR CARCINOMA Bioch Biochain C Current Use CA Cardiacarrest CAU Caucasian Cer A Cerebrovascular accident CPul ACardiopulmonary arrest CS Cancer Stage Curr U. Current Use DiagDiagnosis Dr Al Drink Alcohol? Exc Y. Excision Year Gen Gender Gr GradeHeight HT IC Ischemic cardiomyopathy LC Lung Cancer LCC LARGE CELLCARCINOMA LCNC Large Cell Neuroendocrine Carcinoma LN Lung Normal MCEMassive cerebral edema N No NC NEUROENDOCRINE CARCINOMA Nev. U. NeverUsed Norm-L Normal Lung N-P2-PM Normal (Pool 2)-PM N-PM Normal-PM NSCC .. . NON-SMALL CELL CARCINOMA WITH SARCOMUTOUS TRANSFORMTAIO NU Neverused O Occasional Use P Previous Use P2 Pool 2 Prev U. Previous Use SQSquamous Cell Carcinoma Sm P Y? Have people at home smoked in past 15 yrSm ppl If yes, how many? SCC SMALL CELL CARCINOMA SMOKE_GROWING_UP Didpeople smoke at home while growing up Surg Surgical Tum % TumorPercentage WCAU White Caucasian Y Yes

TABLE 3 Tissue samples in normal panel: Lot no. Source Tissue PathologySex/Age  1-Am-Colon (C71) 071P10B Ambion Colon PM F/43  2-B-Colon (C69)A411078 Biochain Colon PM-Pool of 10 M&F  3-Cl-Colon (C70) 1110101Clontech Colon PM-Pool of 3 M&F  4-Am-Small Intestine 091P0201A AmbionSmall Intestine PM M/75  5-B-Small Intestine A501158 Biochain SmallIntestine PM M/63  6-B-Rectum A605138 Biochain Rectum PM M/25 7-B-Rectum A610297 Biochain Rectum PM M/24  8-B-Rectum A610298 BiochainRectum PM M/27  9-Am-Stomach 110P04A Ambion Stomach PM M/16 10-B-StomachA501159 Biochain Stomach PM M/24 11-B-Esophagus A603814 BiochainEsophagus PM M/26 12-B-Esophagus A603813 Biochain Esophagus PM M/4113-Am-Pancreas 071P25C Ambion Pancreas PM M/25 14-CG-Pancreas CG-255-2Ichilov Pancreas PM M/75 15-B-Lung A409363 Biochain Lung PM F/2616-Am-Lung (L93) 111P0103A Ambion Lung PM F/61 17-B-Lung (L92) A503204Biochain Lung PM M/28 18-Am-Ovary (O47) 061P43A Ambion Ovary PM F/1619-B-Ovary (O48) A504087 Biochain Ovary PM F/51 20-B-Ovary (O46) A504086Biochain Ovary PM F/41 21-Am-Cervix 101P0101A Ambion Cervix PM F/4022-B-Cervix A408211 Biochain Cervix PM F/36 23-B-Cervix A504089 BiochainCervix PM-Pool of 5 M&F 24-B-Uterus A411074 Biochain Uterus PM-Pool of10 M&F 25-B-Uterus A409248 Biochain Uterus PM F/43 26-B-Uterus A504090Biochain Uterus PM-Pool of 5 M&F 27-B-Bladder A501157 Biochain BladderPM M/29 28-Am-Bladder 071P02C Ambion Bladder PM M/20 29-B-BladderA504088 Biochain Bladder PM-Pool of 5 M&F 30-Am-Placenta 021P33A AmbionPlacenta PB F/33 31-B-Placenta A410165 Biochain Placenta PB F/2632-B-Placenta A411073 Biochain Placenta PB-Pool of 5 M&F 33-B-Breast(B59) A607155 Biochain Breast PM F/36 34-Am-Breast (B63) 26486 AmbionBreast PM F/43 35-Am-Breast (B64) 23036 Ambion Breast PM F/5736-Cl-Prostate (P53) 1070317 Clontech Prostate PB-Pool of 47 M&F37-Am-Prostate (P42) 061P04A Ambion Prostate PM M/47 38-Am-Prostate(P59) 25955 Ambion Prostate PM M/62 39-Am-Testis 111P0104A Ambion TestisPM M/25 40-B-Testis A411147 Biochain Testis PM M/74 41-Cl-Testis 1110320Clontech Testis PB-Pool of 45 M&F 42-CG-Adrenal CG-184-10 IchilovAdrenal PM F/81 43-B-Adrenal A610374 Biochain Adrenal PM F/83 44-B-HeartA411077 Biochain Heart PB-Pool of 5 M&F 45-CG-Heart CG-255-9 IchilovHeart PM M/75 46-CG-Heart CG-227-1 Ichilov Heart PM F/36 47-Am-Liver081P0101A Ambion Liver PM M/64 48-CG-Liver CG-93-3 Ichilov Liver PM F/1949-CG-Liver CG-124-4 Ichilov Liver PM F/34 50-Cl-BM 1110932 ClontechBone Marrow PM-Pool of 8 M&F 51-CGEN-Blood WBC#5 CGEN Blood M52-CGEN-Blood WBC#4 CGEN Blood M 53-CGEN-Blood WBC#3 CGEN Blood M54-CG-Spleen CG-267 Ichilov Spleen PM F/25 55-CG-Spleen 111P0106B AmbionSpleen PM M/25 56-CG-Spleen A409246 Biochain Spleen PM F/12 56-CG-ThymusCG-98-7 Ichilov Thymus PM F/28 58-Am-Thymus 101P0101A Ambion Thymus PMM/14 59-B-Thymus A409278 Biochain Thymus PM M/28 60-B-Thyroid A610287Biochain Thyroid PM M/27 61-B-Thyroid A610286 Biochain Thyroid PM M/2462-CG-Thyroid CG-119-2 Ichilov Thyroid PM F/66 63-Cl-Salivary Gland1070319 Clontech Salivary Gland PM-Pool of 24 M&F 64-Am-Kidney 111P0101BAmbion Kidney PM-Pool of 14 M&F 65-Cl-Kidney 1110970 Clontech KidneyPM-Pool of 14 M&F 66-B-Kidney A411080 Biochain Kidney PM-Pool of 5 M&F67-CG-Cerebellum CG-183-5 Ichilov Cerebellum PM M/74 68-CG-CerebellumCG-212-5 Ichilov Cerebellum PM M/54 69-B-Brain A411322 Biochain Brain PMM/28 70-Cl-Brain 1120022 Clontech Brain PM-Pool of 2 M&F 71-B-BrainA411079 Biochain Brain PM-Pool of 2 M&F 72-CG-Brain CG-151-1 IchilovBrain PM F/86 73-Am-Skeletal Muscle 101P013A Ambion Skeletal Muscle PMF/28 74-Cl-Skeletal Muscle 1061038 Clontech Skeletal Muscle PM-Pool of 2M&F

TABLE 3_1 Sample id (GCI)/case id Tissue id Sample id (Asterand)(GCI)/Specimen (Asterand)/RNA old sample name sample name Source Lot no.id (Asternd) id (GCI)  7-B-Rectum 1-(7)-Bc-Rectum Biochain A610297 8-B-Rectum 2-(8)-Bc-Rectum Biochain A610298 new colon 3-GC-Colon GCICDSUV CDSUVNR3 new colon 4-As-Colon Asterand 16364 31802 31802B1 newcolon 5-As-Colon Asterand 22900 74446 74446B1 new small bowl 6-GC-Smallbowl GCI V9L7D V9L7DN6Z new small bowl 7-GC-Small bowl GCI M3GVTM3GVTN5R new small bowl 8-GC-Small bowl GCI 196S2 196S2AJN  9-Am-Stomach9-(9)-Am-Stomach Ambion 110P04A 10-B-Stomach 10-(10)-Bc-Stomach BiochainA501159 11-B-Esophagus 11-(11)-Bc-Esoph Biochain A603814 12-B-Esophagus12-(12)-Bc-Esoph Biochain A603813 new pancreas 13-As-Panc Asterand 89189442 9442C1 new pancreas 14-As-Panc Asterand 10082 11134 11134B148-CG-Liver 15-(48)-Ic-Liver Ichilov CG-93-3 new liver 16-As-LiverAsterand 7916 7203 7203B1 28-Am-Bladder 17-(28)-Am-Bladder Ambion071P02C 29-B-Bladder 18-(29)-Bc-Bladder Biochain A504088 64-Am-Kidney19-(64)-Am-Kidney Ambion 111P0101B 65-Cl-Kidney 20-(65)-Cl-KidneyClontech 1110970 66-B-Kidney 21-(66)-Bc-Kidney Biochain A411080 newkidney 22-GC-Kidney GCI N1EVZ N1EVZN91 new kidney 23-GC-Kidney GCI BMI6WBMI6WN9F 42-CG-Adrenal 24-(42)-Ic-Adrenal Ichilov CG-184-10 43-B-Adrenal25-(43)-Bc-Adrenal Biochain A610374 16-Am-Lung (L93) 26-(16)-Am-LungAmbion 111P0103A 17-B-Lung (L92) 27-(17)-Bc-Lung Biochain A503204 newlung 28-As-Lung Asterand 9078 9275 9275B1 new lung 29-As-Lung Asterand6692 6161 6161A1 new lung 30-As-Lung Asterand 7900 7180 7180F175-G-Ovary 31-(75)-GC-Ovary GCI L629FRV1 76-G-Ovary 32-(76)-GC-Ovary GCIDWHTZRQX 77-G-Ovary 33-(77)-GC-Ovary GCI FDPL9NJ6 78-G-Ovary34-(78)-GC-Ovary GCI GWXUZN5M 21-Am-Cervix 35-(21)-Am-Cerix Ambion101P0101A new cervix 36-GC-cervix GCI E2P2N E2P2NAP4 24-B-Uterus37-(24)-Bc-Uterus Biochain A411074 26-B-Uterus 38-(26)-Bc-UterusBiochain A504090 30-Am-Placenta 39-(30)-Am-Placen Ambion 021P33A32-B-Placenta 40-(32)-Bc-Placen Biochain A411073 new breast 41-GC-BreastGCI DHLR1 new breast 42-GC-Breast GCI TG6J6 new breast 43-GC-Breast GCIE6UDD E6UDDNCF 38-Am-Prostate (P59) 44-(38)-Am-Prostate Ambion 25955 addprostate from 45-Bc-Prostate Biochain A609258 prostate panel new testis46-As-Testis Asterand 13071 19567 19567B1 new testis 47-As-TestisAsterand 19671 42120 42120A1 ARTERY 48-GC-Artery GCI 7FUUP 7FUUPAMPARTERY 49-GC-Artery GCI YGTVY YGTVYAIN blood cells 50-Th-Blood-PBMCTel-Hashomer 52497 blood cells 51-Th-Blood-PBMC Tel-Hashomer 31055 bloodcells 52-Th-Blood-PBMC Tel-Hashomer 31058 54-CG-Spleen 53-(54)-Ic-SpleenIchilov CG-267 55-Am-Spleen 54-(55)-Am-Spleen Ambion 111P0106B57-CG-Thymus 55-(57)-Ic-Thymus Ichilov CG-98-7 58-Am-Thymus56-(58)-Am-Thymus Ambion 101P0101A 60-B-Thyroid 57-(60)-Bc-ThyroidBiochain A610287 62-CG-Thyroid 58-(62)-Ic-Thyroid Ichilov CG-119-2 newsalivary gland 59-Gc-Sali gland GCI NNSMV NNSMVNJC 67-CG-Cerebellum60-(67)-Ic-Cerebellum Ichilov CG-183-5 68-CG-Cerebellum61-(68)-Ic-Cerebellum Ichilov CG-212-5 69-B-Brain 62-(69)-Bc-BrainBiochain A411322 71-B-Brain 63-(71)-Bc-Brain Biochain A41107972-CG-Brain 64-(72)-Ic-Brain Ichilov CG-151-1 44-B-Heart65-(44)-Bc-Heart Biochain A411077 46-CG-Heart 66-(46)-Ic-Heart IchilovCG-227-1 45-CG-Heart (Fibrotic) 67-(45)-Ic-Heart (Fibrotic) IchilovCG-255-9 new skeletal muscle 68-GC-Skel Mus GCI T8YZS T8YZSN7O newskeletal muscle 69-GC-Skel Mus GCI Q3WKA Q3WKANCJ new skeletal muscle70-As-Skel Mus Asterand 8774 8235 8235G1 new skeletal muscle 71-As-SkelMus Asterand 8775 8244 8244A1 new skeletal muscle 72-As-Skel MusAsterand 10937 12648 12648C1 new skeletal muscle 73-As-Skel Mus Asterand6692 6166 6166A1

Materials and Experimental Procedures

RNA preparation—RNA was obtained from Clontech (Franklin Lakes, N.J. USA07417, dot clontech dot com), BioChain Inst. Inc. (Hayward, Calif. 94545USA dot biochain dot com), ABS (Wilmington, Del. 19801, USA, dotabsbioreagents dot com) or Ambion (Austin, Tex. 78744 USA, dot ambiondot com). Alternatively, RNA was generated from tissue samples usingTRI-Reagent (Molecular Research Center), according to Manufacturer'sinstructions. Tissue and RNA samples were obtained from patients or frompostmortem. Total RNA samples were treated with DNaseI (Ambion) andpurified using RNeasy columns (Qiagen).

RT PCR—Purified RNA (1 μg) was mixed with 150 ng Random Hexamer primers(Invitrogen) and 500 μM dNTP in a total volume of 15.6 μl. The mixturewas incubated for 5 min at 65° C. and then quickly chilled on ice.Thereafter, 5 μl of 5×SuperscriptII first strand buffer (Invitrogen),2.4 μl 0.1M DTT and 40 units RNasin (Promega) were added, and themixture was incubated for 10 min at 25° C., followed by furtherincubation at 42° C. for 2 min. Then, 1 μl (200 units) of SuperscriptII(Invitrogen) was added and the reaction (final volume of 25 μl) wasincubated for 50 min at 42° C. and then inactivated at 70° C. for 15min. The resulting cDNA was diluted 1:20 in TE buffer (10 mM Tris pH=8,1 mM EDTA pH=8).

Real-Time RT-PCR analysis—cDNA (5 μl), prepared as described above, wasused as a template in Real-Time PCR reactions using the SYBR Green Iassay (PE Applied Biosystem) with specific primers and UNG Enzyme(Eurogentech or ABI or Roche). The amplification was effected asfollows: 50° C. for 2 min, 95° C. for 10 min, and then 40 cycles of 95°C. for 15 sec, followed by 60° C. for 1 min. Detection was performed byusing the PE Applied Biosystem SDS 7000. The cycle in which thereactions achieved a threshold level (Ct) of fluorescence was registeredand was used to calculate the relative transcript quantity in the RTreactions. The relative quantity was calculated using the equationQ=efficiencŷ^(−Ct). The efficiency of the PCR reaction was calculatedfrom a standard curve, created by using serial dilutions of severalreverse transcription (RT) reactions. To minimize inherent differencesin the RT reaction, the resulting relative quantities were normalized tonormalization factor calculated in one of the following methods asindicated in the text:

Method 1—the geometric mean of the relative quantities of the selectedhousekeeping (HSKP) genes was used as normalization factor.

Method 2—The expression of several housekeeping (HSKP) genes was checkedon every panel. The relative quantity (Q) of each housekeeping gene ineach sample, calculted as described above, was diveded by the medianquantity of this gene in all panel samples to obtain the “relative Q relto MED”. Then, for each sample the median of the “relative Q rel to MED”of the selected housekeeping genes was calculted and served asnormalization factor of this sample for further calculations. Unlessdefined otherwise, the normalization of the Real-Time RT-PCR analysisresults described herein was carried out according to method 1 above.

Schematic summary of quantitative real-time PCR analysis is presented inFIG. 3. As shown, the x-axis shows the cycle number. The C_(T)=ThresholdCycle point, which is the cycle that the amplification curve crosses thefluorescence threshold that was set in the experiment. This point is acalculated cycle number in which PCR product signal is above thebackground level (passive dye ROX) and still in theGeometric/Exponential phase (as shown, once the level of fluorescencecrosses the measurement threshold, it has a geometrically increasingphase, during which measurements are most accurate, followed by a linearphase and a plateau phase; for quantitative measurements, the latter twophases do not provide accurate measurements). The y-axis shows thenormalized reporter fluorescence. It should be noted that this type ofanalysis provides relative quantification.

The sequences of the housekeeping genes measured in all the examples intesting panel were as follows:

Ubiquitin (GenBank Accession No. BC000449 (SEQ ID NO: 1711)) UbiquitinForward primer (SEQ ID NO: 326): ATTTGGGTCGCGGTTCTTG Ubiquifin Reverseprimer (SEQ ID NO: 327): TGCCTTGACATTCTCGATGGT Ubiquitin-amplicon (SEQID NO: 328) ATTTGGGTCGCGGTTCTTGTTTGTGGATCGCTGTGATCGTCACTTGACAATGCAGATCTTCGTGAAGACTCTGACTGGTAAGACCATCACCCTCGAGGTTGAGCCCAGTGACACCATCGAGAATGTCAAGGCA SDHA (GenBank Accession No. NM_004168(SEQ ID NO: 1712)) SDHA Forward primer (SEQ ID NO: 329):TGGGAACAAGAGGGCATCTG SDHA Reverse primer (SEQ ID NO: 330):CCACCACTGCATCAAATTCATG SDHA-amplicon (SEQ ID NO: 331):TGGGAACAAGAGGGCATGTGCTAAAGTTTCAGATTCCATTTCTGCTCAGTATCCAGTAGTGGATCATGAATTTGATGCAGTGGTGG PBGD (GenBank Accession No.BC019323 (SEQ ID NO: 1713)), PBGD Forward primer (SEQ ID NO: 332):TGAGAGTGATTCGCGTGGG PBGD Reverse primer (SEQ ID NO: 333):CCAGGGTACGAGGCTTTCAAT PBGD-amplicon (SEQ ID NO: 334):TGAGAGTGATTCGCGTGGGTACCCGCAAGAGCCAGCTTGCTCGCATACAGACGGACAGTGTGGTGGCAACATTGAAAGCCTCGTACCCTGG HPRT1 (Genflank Accession No.NM_000194 (SEQ ID NO: 1714)), HPRT1 Forward primer (SEQ ID NO: 1295):TGACACTGGCAAAACAATGCA HPRT1 Reverse primer (SEQ ID NO: 1296):GGTCCTTTTCACCAGCAAGCT HPRT1-amplicon (SEQ ID NO: 1297):TGACACTGGCAAAACAATGCAGACTTTGCTTTCCTTGGTCAGGCAGTATAATCCAAAGATGGTCAAGGTCGCAAGCTTGCTGGTGAAAAGGACC

The sequences of the housekeeping genes measured in all the examples onnormal tissue samples panel were as follows:

RPL19 (GenBank Accession No. NM_000981 (SEQ ID NO: 1715)), RPL19 Forwardprimer (SEQ ID NO: 1298): TGGCAAGAAGAAGGTCTGGTTAG RPL19 Reverse primer(SEQ ID NO: 1420): TGATCAGCCCATCTTTGATGAG RPL19-amplicon (SEQ ID NO:1630): TGGCAAGAAGAAGGTCTGGTTAGACCCCAATGAGACCAATGAAATCGCCAATGCCAACTCCCGTCAGCAGATCCGGAAGCTCATCAAAGATGGGCTGATC A TATA box (GenBankAccession No. NM_003194 (SEQ ID NO: 1716)), TATA box Forward primer (SEQID NO: 1631): CGGTTTGCTGCGGTAATCAT TATA box Reverse primer (SEQ ID NO:1632): TTTCTTGCTGCCAGTCTGGAC TATA box-amplicon (SEQ ID NO: 1633):CGGTTTGCTGCGGTAATCATGAGGATAAGAGAGCCACGAACCACGGCACTGATTTTCAGTTCTGGGAAAATGGTGTGCACAGGAGCCAAGAGTGAAGAACAGTCCAGACTGGCAGCAAGAAA Ubiquitin (GenBank Accession No. BC000449 (SEQ IDNO: 1711)) Ubiquitin Forward primer (SEQ ID NO: 326):ATTTGGGTCGCGGTTCTTG Ubiquitin Reverse primer (SEQ ID NO: 327):TGCCTTGACATTCTCGATGGT Ubiquitin-amplicon (SEQ ID NO: 328)ATTTGGGTCGCGGTTCTTGTTTGTGGATCGCTGTGATCGTCACTTGACAATGCAGATCTTCGTGAAGACTCTGACTGGTAAGACCATCACCCTCGAGGTTGAGCCCAGTGACACCATCGAGAATGTCAAGGCA SDHA (GenBank Accession No.NM_004168 (SEQ ID NO: 1712)) SDHA Forward primer (SEQ ID NO: 329):TGGGAACAAGAGGGCATCTG SDHA Reverse primer (SEQ ID NO: 330):CCACCACTGCATCAAATTCATG SDHA-amplicon (SEQ ID NO: 331):TGGGAACAAGAGGGCATCTGCTAAAGTTTCAGATTCCATTTCTGCTCAGTATCCAGTAGTGGATCATGAATTTGATGCAGTGGTGG

Oligonucleotide-Based Micro-Array Experiment Protocol—

Microarray Fabrication

Microarrays (chips) were printed by pin deposition using the MicroGridII MGII 600 robot from BioRobotics Limited (Cambridge, UK). 50-meroligonucleotides target sequences were designed by Compugen Ltd(Tel-Aviv, IL) as described by A. Shoshan et al, “Optical technologiesand informatics”, Proceedings of SPIE. Vol 4266, pp. 86-95 (2001). Thedesigned oligonucleotides were synthesized and purified by desaltingwith the Sigma-Genosys system (The Woodlands, Tex., US) and all of theoligonucleotides were joined to a C6 amino-modified linker at the 5′end, or being attached directly to CodeLink slides (Cat #25-6700-01.Amersham Bioscience, Piscataway, N.J., US). The 50-mer oligonucleotides,forming the target sequences, were first suspended in Ultra-pure DDW(Cat #01-866-1A Kibbutz Beit-Haemek, Israel) to a concentration of 50μM. Before printing the slides, the oligonucleotides were resuspended in300 mM sodium phosphate (pH 8.5) to final concentration of 150 mM andprinted at 35-40% relative humidity at 21° C.

Each slide contained a total of 9792 features in 32 subarrays. Of thesefeatures, 4224 features were sequences of interest according to thepresent invention and negative controls that were printed in duplicate.An additional 288 features (96 target sequences printed in triplicate)contained housekeeping genes from Human Evaluation Library2, CompugenLtd, Israel. Another 384 features are E. coli spikes 1-6, which areoligos to E-Coli genes which are commercially available in the ArrayControl product (Array control—sense oligo spots, Ambion Inc. Austin,Tex. Cat #1781, Lot #112K06).

Post-Coupling Processing of Printed Slides

After the spotting of the oligonucleotides to the glass (CodeLink)slides, the slides were incubated for 24 hours in a sealed saturatedNaCl humidification chamber (relative humidity 70-75%).

Slides were treated for blocking of the residual reactive groups byincubating them in blocking solution at 50° C. for 15 minutes (10ml/slide of buffer containing 0.1M Tris, 50 mM ethanolamine, 0.1% SDS).The slides were then rinsed twice with Ultra-pure DDW (double distilledwater). The slides were then washed with wash solution (10 ml/slide.4×SSC, 0.1% SDS)) at 50° C. for 30 minutes on the shaker. The slideswere then rinsed twice with Ultra-pure DDW, followed by drying bycentrifugation for 3 minutes at 800 rpm.

Next, in order to assist in automatic operation of the hybridizationprotocol, the slides were treated with Ventana Discovery hybridizationstation barcode adhesives. The printed slides were loaded on aBio-Optica (Milan, Italy) hematology staining device and were incubatedfor 10 minutes in 50m1 of 3-Aminopropyl Triethoxysilane (Sigma A3648 lot#122K589). Excess fluid was dried and slides were then incubated forthree hours in 20 mm/Hg in a dark vacuum desiccator (Pelco 2251, TedPella, Inc. Redding Calif.).

The following protocol was then followed with the Genisphere 900-RP(random primer), with mini elute columns on the Ventana DiscoveryHybStation™, to perform the microarray experiments. Briefly, theprotocol was performed as described with regard to the instructions andinformation provided with the device itself. The protocol included cDNAsynthesis and labeling. cDNA concentration was measured with the TBS-380(Turner Biosystems. Sunnyvale, Calif.) PicoFlour, which is used with theOliGreen ssDNA Quantitation reagent and kit.

Hybridization was performed with the Ventana Hybridization device,according to the provided protocols (Discovery Hybridization StationTuscon Ariz.).

The slides were then scanned with GenePix 4000B dual laser scanner fromAxon Instruments Inc, and analyzed by GenePix Pro 5.0 software.

Schematic summary of the oligonucleotide based microarray fabricationand the experimental flow is presented in FIGS. 4 and 5.

Briefly, as shown in FIG. 4, DNA oligonucleotides at 25 uM weredeposited (printed) onto Amersham ‘CodeLink’ glass slides generating awell defined ‘spot’. These slides are covered with a long-chain,hydrophilic polymer chemistry that creates an active 3-D surface thatcovalently binds the DNA oligonucleotides 5′-end via the C6-aminemodification. This binding ensures that the full length of the DNAoligonucleotides is available for hybridization to the cDNA and alsoallows lower background, high sensitivity and reproducibility.

FIG. 5 shows a schematic method for performing the microarrayexperiments. It should be noted that stages on the left-hand orright-hand side may optionally be performed in any order, including inparallel, until stage 4 (hybridization). Briefly, on the left-hand side,the target oligonucleotides are being spotted on a glass microscopeslide (although optionally other materials could be used) to form aspotted slide (stage 1). On the right hand side, control sample RNA andcancer sample RNA are Cy3 and Cy5 labeled, respectively (stage 2), toform labeled probes. It should be noted that the control and cancersamples come from corresponding tissues (for example, normal prostatetissue and cancerous prostate tissue). Furthermore, the tissue fromwhich the RNA was taken is indicated below in the specific examples ofdata for particular clusters, with regard to overexpression of anoligonucleotide from a “chip” (microarray), as for example “prostate”for chips in which prostate cancerous tissue and normal tissue weretested as described above. In stage 3, the probes are mixed. In stage 4,hybridization is performed to form a processed slide. In stage 5, theslide is washed and scanned to form an image file, followed by dataanalysis in stage 6.

The following clusters were found to be overexpressed in lung cancer:

-   W60282_PEA_(—)1-   F05068_PEA_(—)1-   H38804_PEA_(—)1-   HSENA78-   T39971-   (R00299)-   H14624-   Z41644_PEA_(—)1-   Z25299_PEA_(—)2-   HSSTROL3-   HUMTREFAC_PEA_(—)2-   HSS100PCB-   HSU33147_PEA_(—)1-   HUMCA1XIA-   H61775-   HUMGRP5E-   HUMODCA-   AA161187-   R66178-   D56406_PEA_(—)1-   M85491_PEA_(—)1-   Z21368_PEA_(—)1-   HUMCA1XIA-   R20779-   R38144_PEA_(—)2-   Z44808_PEA_(—)1-   HUMOSTRO_PEA_(—)1_PEA_(—)1-   R11723_PEA_(—)3-   Al076020-   T23580-   M79217_PEA_(—)1-   M62096_PEA_(—)1-   M78076_PEA_(—)1-   T99080_PEA_(—)4-   T08446_PEA_(—)1-   R16276_PEA_(—)1

The following clusters were found to be overexpressed in lung small cellcancer:

H61775

HUMGRP5E

M85491_PEA_(—)1

Z44808_PEA_(—)1

AA161187

R66178

HUMPHOSLIP_PEA_(—)2

Al076020

T23580

M79217_PEA_(—)1

M62096_PEA_(—)1

M78076_PEA_(—)1

T99080_PEA_(—)4

T08446_PEA_(—)1

The following clusters were found to be overexpressed in lungadenocarcinoma:

R00299

M85491_PEA_(—)1

Z21368_PEA_(—)1

HUMCA1XIA

AA161187

R66178

T11628_PEA_(—)1

The following clusters were found to be overexpressed in lung squamouscell:

HUMODCA

R00299

D56406_PEA_(—)1

Z44808_PEA_(—)1

Z21368_PEA_(—)1

HUMCA1XIA

AA161187

R66178

HUMCEA_PEA_(—)1

R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1

Description for Cluster H61775

Cluster H61775 features 2 transcript(s) and 6 segment(s) of interest,the names for which are given in Tables 4 and 5, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 6.

TABLE 4 Transcripts of interest Transcript Name Sequence ID No.H61775_T21 1 H61775_T22 2

TABLE 5 Segments of interest Segment Name Sequence ID No. H61775_node_2151 H61775_node_4 152 H61775_node_6 153 H61775_node_8 154 H61775_node_0155 H61775_node_5 156

TABLE 6 Proteins of interest Protein Name Sequence ID No. H61775_P161281 H61775_P17 1282

Cluster H61775 can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 6 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 6 and Table 7. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions:brain malignant tumors and a mixture of malignant tumors from differenttissues.

TABLE 7 Normal tissue distribution Name of Tissue Number bladder 0 brain0 colon 0 epithelial 10 general 3 breast 8 muscle 0 ovary 0 pancreas 0prostate 0 uterus 0

TABLE 8 P values and ratios for expression in cancerous tissue Name ofTissue P1 P2 SP1 R3 SP2 R4 bladder 3.1e−01 3.8e−01 3.2e−01 2.5 4.6e−011.9 brain 8.8e−02 6.5e−02 1 3.5 4.1e−04 5.8 colon 5.6e−01 6.4e−01 1 1.11 1.1 epithelial 3.0e−02 1.3e−01 2.3e−02 2.1 3.2e−01 1.2 general 1.3e−064.9e−05 1.0e−07 6.3 1.5e−06 4.3 breast 4.7e−01 3.7e−01 3.3e−01 2.04.6e−01 1.6 muscle 2.3e−01 2.9e−01 1.5e−01 6.8 3.9e−01 2.6 ovary 3.8e−014.2e−01 1.5e−01 2.4 2.6e−01 1.9 pancreas 3.3e−01 4.4e−01 4.2e−01 2.45.3e−01 1.9 prostate 7.3e−01 7.8e−01 6.7e−01 1.5 7.5e−01 1.3 uterus1.0e−01 2.6e−01 2.9e−01 2.6 5.1e−01 1.8

As noted above, contig H61775 features 2 transcript(s), which werelisted in Table 4 above. A description of each variant protein accordingto the present invention is now provided.

Variant protein H61775_P16 (SEQ ID NO:1281) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) H61775_T21 (SEQ ID NO:1).One or more alignments to one or more previously published proteinsequences are given at the end of the application. A brief descriptionof the relationship of the variant protein according to the presentinvention to each such aligned protein is as follows:

Comparison report between H61775_P16 (SEQ ID NO:1281) and Q9P2J2 (SEQ IDNO:1694):

1. An isolated chimeric polypeptide encoding for H61775_P16 (SEQ IDNO:1281), comprising a first amino acid sequence being at least 90%homologous toMVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRPPLHVIEWLRFGFLLPIFIQFGLYSPRIDPDYVG corresponding to amino acids 11-93 of Q9P2J2 (SEQ ID NO:1694),which also corresponds to amino acids 1-83 of H61775_P16 (SEQ IDNO:1281), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequenceDCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCWRSSCSVTLQV(SEQ ID NO: 1754) corresponding to amino acids 84-152 of H61775_P16 (SEQID NO:1281), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of H61775_P16 (SEQ IDNO:1281), comprising a polypeptide begin at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequenceDCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCWRSSCSVTLQV(SEQ ID NO: 1754) in H61775_P16 (SEQ ID NO:1281).

Comparison report between H61775_P16 (SEQ ID NO:1281) and AAQ88495 (SEQID NO:1695):

1. An isolated chimeric polypeptide encoding for H61775_P16 (SEQ IDNO:1281), comprising a first amino acid sequence being at least 90%homologous toMVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRPPLHVIEWLRFGFLLKFIQFGLYSPRIDPDYVG corresponding to amino acids 11-83 of AAQ88495 (SEQ ID NO:1695),which also corresponds to amino acids 1-83 of H61775_P16 (SEQ IDNO:1281), and a second amino acid sequence being at least 70%,optionally at least 80% preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequenceDCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCWRSSCSVTLQV(SEQ ID NO: 1754) corresponding to amino acids 84-152 of H61775_P16 (SEQID NO:1281), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of H61775_P16 (SEQ IDNO:1281), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequenceDCGFPAFRELKRAETVSPVFFTRRCIWEDLKSTGFSPAGGGRPPGGGPRTQEDSGLPCWRSSCSVTLQV(SEQ ID NO: 1754) in H61775_P16 (SEQ ID NO:1281).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein H61775_P16 (SEQ ID NO:1281) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 9,(given according to their position(s) on the amino acid sequence, withthe alternative amino acid(s) listed; the last column indicates whetherthe SNP is known or not; the presence of known SNPs in variant proteinH61775_P16 (SEQ ID NO:1281) sequence provides support for the deducedsequence of this variant protein according to the present invention).

TABLE 9 Amino acid mutations SNP position(s) on amino acid Alternativesequence amino acid(s) Previously known SNP? 14 I -> T No 138 G -> R No34 G -> E Yes 48 G -> R No 91 R -> * Yes

Variant protein H61775_P16 (SEQ ID NO:1281) is encoded by the followingtranscript(s): H61775_T21 (SEQ ID NO:1), for which the sequence(s)is/are given at the end of the application. The coding portion oftranscript H61775_T21 (SEQ ID NO:1) is shown in bold; this codingportion starts at position 261 and ends at position 716. The transcriptalso has the following SNPs as listed in Table 10 (given according totheir position on the nucleotide sequence, with the alternative nucleicacid listed; the last column indicates whether the SNP is known or not;the presence of known SNPs in variant protein H61775_P16 (SEQ IDNO:1281) sequence provides support for the deduced sequence of thisvariant protein according to the present invention).

TABLE 10 Nucleic acid SNPs SNP position Alternative on nucleotidesequence nucleic acid Previously known SNP? 117 T -> C Yes 200 T -> C No672 G -> C No 222 T -> C Yes 301 T -> C No 361 G -> A Yes 377 G -> A No400 -> C No 402 G -> C No 531 C -> T Yes 566 T -> C No

Variant protein H61775_P17 (SEQ ID NO:1282) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) H61775_T22 (SEQ ID NO:2).One or more alignments to one or more previously published proteinsequences are given at the end of the application. A brief descriptionof the relationship of the variant protein according to the presentinvention to each such aligned protein is as follows:

Comparison report between H61775_P17 (SEQ ID NO:1282) and Q9P2J2 (SEQ IDNO:1694):

1. An isolated chimeric polypeptide encoding for H61775_P17 (SEQ IDNO:1282), comprising a first amino acid sequence being at least 90%homologous toMVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRPPLHVIEWLRFGFLLPIFIQFGLYSPRIDPDYVG corresponding to amino acids 11-93 of Q9P2J2 (SEQ ID NO:1694),which also corresponds to amino acids 1-83 of H61775_P17 (SEQ IDNO:1282).

Comparison report between H61775_P17 (SEQ ID NO:1282) and AAQ88495 (SEQID NO:1695):

1. An isolated chimeric polypeptide encoding for H61775_P17 (SEQ IDNO:1282), comprising a first amino acid sequence being at least 90%homologous toMVWCLGLAVLSLVISQGADGRGKPEVVSVVGRAGESVVLGCDLLPPAGRPPLHVIEWLRFGFLLPIFIQFGLYSPRIDPDYVG corresponding to amino acids 1-83 of AAQ88495 (SEQ ID NO:1695),which also corresponds to amino acids 1-83 of H61775_P17 (SEQ IDNO:1282).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein H61775_P17 (SEQ ID NO:1282) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 11,(given according to their position(s) on the amino acid sequence, withthe alternative amino acid(s) listed; the last column indicates whetherthe SNP is known or not; the presence of known SNPs in variant proteinH61775_P17 (SEQ ID NO:1282) sequence provides support for the deducedsequence of this variant protein according to the present invention).

TABLE 11 Amino acid mutations SNP position(s) on amino acid Alternativesequence amino acid(s) Previously known SNP? 14 I -> T No 34 G -> E Yes48 G -> R No

Variant protein H61775_P17 (SEQ ID NO:1282) is encoded by the followingtranscript(s): H61775_T22 (SEQ ID NO:2), for which the sequence(s)is/are given at the end of the application. The coding portion oftranscript H61775_T22 (SEQ ID NO:2) is shown in bold; this codingportion starts at position 261 and ends at position 509. The transcriptalso has the following SNPs as listed in Table 12 (given according totheir position on the nucleotide sequence, with the alternative nucleicacid listed; the last column indicates whether the SNP is known or not;the presence of known SNPs in variant protein H61775_P17 (SEQ IDNO:1282) sequence provides support for the deduced sequence of thisvariant protein according to the present invention).

TABLE 12 Nucleic acid SNPs SNP position Alternative on nucleotidesequence nucleic acid Previously known SNP? 117 T -> C Yes 200 T -> C No222 T -> C Yes 301 T -> C No 361 G -> A Yes 377 G -> A No 400 -> C No402 G -> C No 596 T -> A Yes

As noted above, cluster H61775 features 6 segment(s), which were listedin Table 5 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster H61775_node_(—)2 (SEQ ID NO:1022) according to thepresent invention is supported by 17 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): H61775_T21 (SEQ ID NO:1) and H61775_T22 (SEQ IDNO:2). Table 13 below describes the starting and ending position of thissegment on each transcript.

TABLE 13 Segment location on transcripts Segment Segment Transcript namestarting position ending position H61775_T21 (SEQ ID NO: 1) 87 318H61775_T22 (SEQ ID NO: 2) 87 318

Segment cluster H61775_node_(—)4 (SEQ ID NO:1023) according to thepresent invention is supported by 20 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): H61775_T21 (SEQ ID NO:1) and H61775_T22 (SEQ IDNO:2). Table 14 below describes the starting and ending position of thissegment on each transcript.

TABLE 14 Segment location on transcripts Segment Segment Transcript namestarting position ending position H61775_T21 (SEQ ID NO: 1) 319 507H61775_T22 (SEQ ID NO: 2) 319 507

Segment cluster H61775_node_(—)6 (SEQ ID NO:1024) according to thepresent invention is supported by 1 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): H61775_T22 (SEQ ID NO:2). Table 15 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 15 Segment location on transcripts Segment Segment Transcript namestarting position ending position H61775_T22 (SEQ ID NO: 2) 515 715

Segment cluster H61775_node_(—)8 (SEQ ID NO:1025) according to thepresent invention is supported by 5 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): H61775_T21 (SEQ ID NO:1). Table 16 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 16 Segment location on transcripts Segment Segment Transcript namestarting position ending position H61775_T21 (SEQ ID NO: 1) 508 1205

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 bp in length, and so are included in a separatedescription.

Segment cluster H61775_node_(—)0 (SEQ ID NO:1026) according to thepresent invention is supported by 4 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): H61775_T21 (SEQ ID NO:1) and H61775_T22 (SEQ IDNO:2). Table 17 below describes the starting and ending position of thissegment on each transcript.

TABLE 17 Segment location on transcripts Segment Segment starting endingTranscript name position position H61775_T21 (SEQ ID NO: 1) 1 86H61775_T22 (SEQ ID NO: 2) 1 86

Segment cluster H61775_node_(—)5 (SEQ ID NO:1027) according to thepresent invention can be found in the following transcript(s):H61775_T22 (SEQ ID NO:2). Table 18 below describes the starting andending position of this segment on each transcript.

TABLE 18 Segment location on transcripts Segment Segment starting endingTranscript name position position H61775_T22 (SEQ ID NO: 2) 508 514

Microarray (chip) data is also available for this gene as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotides were found to hit this segment (with regard to lungcancer), shown in Table 19.

TABLE 19 Oligonucleotides related to this gene Overexpressed ChipOligonucleotide name in cancers reference H61775_0_11_0 (SEQ ID NO: 204)Lung cancer Lung

Variant protein alignment to the previously known protein:

Sequence name: /tmp/Psw0RJLCti/aLAXQjXh07:Q9P2J2 (SEQ ID NO:1694)Sequence documentation: Alignment of: H61775_P16 (SEQ ID NO:1281) xQ9P2J2 (SEQ ID NO:1694) .. Alignment segment 1/1: Quality: 803.00Escore: 0 Matching length: 83 Total length: 83 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Indentity: 100.00 Gaps: 0 Alignment:            .         .         .         .         .

            .         .         .

Sequence name: /tmp/Psw0RJLCti/aLAXQjXh07:AAQ88495 (SEQ ID NO:1695)Sequence documentation: Alignment of: H61775_P16 (SEQ ID NO:1281) xAAQ88495 (SEQ ID NO:1695) .. Alignment segment 1/1: Quality: 803.00Escore: 0 Matching length: 83 Total length: 83 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:            .         .         .         .         .

            .         .         .

Sequence name: /tmp/naab8yR3GC/pSM4l2IL5o:Q9P2J2 (SEQ ID NO:1694)Sequence documentation: Alignment of: H61775_P17 (SEQ ID NO:1282) xQ9P2J2 (SEQ ID NO:1694) .. Alignment segment 1/1: Quality: 803.0 Escore:0 Matching length: 83 Total length: 83 Matching Percent Similarity:100.00 Matching Percent Identity: 100.00 Total Percent Similarity:100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:            .         .         .         .         .

            .         .         .

Sequence name: /tmp/naab8yR3GC/pSM4l2IL5o:AAQ88495 (SEQ ID NO:1695)Sequence documentation: Alignment of: H61775_P17 (SEQ ID NO:1282) xAAQ88495 (SEQ ID NO:1695) .. Alignment segment 1/1: Quality: 803.00Escore: 0 Matching length: 83 Total length: 83 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:            .         .         .         .         .

            .         .         .

Expression of immunoglobulin superfamily, member 9, H61775 transcriptswhich are detectable by amplicon as depicted in sequence name H61775seg8(SEQ ID NO: 1636) in normal and cancerous lung tissues

Expression of immunoglobulin superfamily, member 9 transcriptsdetectable by or according to seg8, H61775seg8 amplicon (SEQ ID NO:1636) and H61775seg8F2 (SEQ ID NO: 1634) and H61775seg8R2 (SEQ ID NO:1635) primers was measured by real time PCR. In parallel the expressionof four housekeeping genes—PBGD (GenBank Accession No. BC019323 (SEQ IDNO:1713); amplicon—PBGD-amplicon, SEQ ID NO:334, primers SEQ ID NOs 332and 333), HPRT1 (GenBank Accession No. NM_(—)000194 (SEQ ID NO:1714);amplicon—HPRT1-amplicon, SEQ ID NO:1297; primers SEQ ID NOs 1295 and1296), Ubiquitin (GenBank Accession No. BC000449 (SEQ ID NO:1711);amplicon—Ubiquitin-amplicon, SEQ ID NO:328, primers SEQ ID NOs 326 and327) and SDHA (GenBank Accession No. NM_(—)004168 (SEQ ID NO:1712);amplicon—SDHA-amplicon, SEQ ID NO:331; primers SEQ ID NOs 329 and 330)was measured similarly. For each RT sample, the expression of the aboveamplicon was normalized to the geometric mean of the quantities of thehousekeeping genes. The normalized quantity of each RT sample was thendivided by the median of the quantities of the normal post-mortem (PM)samples (Sample Nos. 47-50, 90-93, 96-99, Table 2, “Tissue samples intesting panel”), to obtain a value of fold up-regulation for each samplerelative to median of the normal PM samples.

FIG. 7 is a histogram showing over expression of the above-indicatedimmunoglobulin superfamily, member 9 transcripts in cancerous lungsamples relative to the normal samples. The number and percentage ofsamples that exhibit at least 5 fold over-expression, out of the totalnumber of samples tested, is indicated in the bottom. As is evident fromFIG. 7, the expression of immunoglobulin superfamily, member 9transcripts detectable by the above amplicon(s) in cancer samples wassignificantly higher than in the non-cancerous samples (Sample Nos.47-50, 90-93, 96-99, Table 2 “Tissue samples in testing panel”). Notablyan over-expression of at least 5 fold was found in 11 out of 15adenocarcinoma samples, 12 out of 16 squamous cell carcinoma samples, 1out of 4 samples of large cell carcinoma samples and in 8 out of 8 smallcell carcinoma samples.

Statistical analysis was applied to verify the significance of theseresults, as described below.

The P value for the difference in the expression levels ofimmunoglobulin superfamily, member 9 transcripts detectable by the aboveamplicon in lung cancer samples versus the normal tissue samples wasdetermined by T test as 6.5E-02. In adenocarcinoma, the minimum valueswere 7.62E-03 in squamous cell adenocarcinoma cancer and 1.5E-03 insmall cell carcinoma.

Threshold of 5 fold overexpression was found to differentiate betweencancer and normal samples with P value of 9.62E-04 in adenocarcinoma,5.9E-04 in squamous cell carcinoma, and a threshold of 10 foldoverexpression was found to differentiate between small celladenocarcinoma cancer and normal samples with P value of 7.14E-05 aschecked by exact fisher test. The above values demonstrate statisticalsignificance of the results.

Primer pairs are also optionally and preferably encompassed within thepresent invention; for example, for the above experiment, the followingprimer pair was used as a non-limiting illustrative example only of asuitable primer pair: H61775seg8F2 forward primer (SEQ ID NO: 1634); andH61775seg8R2 reverse primer (SEQ ID NO: 1635).

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: H6177seg8 (SEQ ID NO:1636).

H61775seg8F2 (SEQ ID NO: 1634) GAAGGCTCTTGTCACTTACTAGCCAT H61775seg8R2(SEQ ID NO: 1635) TGTCACCATATTTAATCCTCCCAA H61775seg8 (SEQ ID NO: 1636)GAAGGCTCTTGTCACTTACTAGCCATGTGATTTTGGAAAGAAACTTAACATTAATTCCTTCAGCTACAATGGAATTCTTGGGAGGATTAAATATGGTGAC AExpression of immunoglobulin superfamily, member 9, H61775 transcriptswhich are detectable by amplicon as depicted in sequence name H61775seg8(SEQ ID NO: 1636) in different normal tissues.

Expression of immunoglobulin superfamily, member 9 transcriptsdetectable by or according to H61775 seg8 amplicon (SEQ ID NO: 1636) andH61775 seg8F2 (SEQ ID NO: 1634) and H61775 seg8R2 (SEQ ID NO: 1635) wasmeasured by real time PCR. In parallel the expression of fourhousekeeping genes—RPL19 (GenBank Accession No. NM_(—)000981 (SEQ IDNO:1715); RPL19 amplicon, SEQ ID NO:1630), TATA box (GenBank AccessionNo. NM_(—)003194 (SEQ ID NO:1716); TATA amplicon, SEQ ID NO:1633),Ubiquitin (GenBank Accession No. BC000449 (SEQ ID NO:1711);amplicon—Ubiquitin-amplicon, SEQ ID NO:328) and SDHA (GenBank AccessionNo. NM_(—)004168 (SEQ ID NO:1712); amplicon—SDHA-amplicon, SEQ IDNO:331) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of thequantities of the housekeeping genes. The normalized quantity of each RTsample was then divided by the median of the quantities of the ovarysamples (Sample Nos. 18-20, Table 4, “Tissue sample in normal panel”,above), to obtain a value of relative expression of each sample relativeto median of the ovary samples.

H61775seg8F2 (SEQ ID NO: 1634) GAAGGCTCTTGTCACTTACTAGCCAT H61775seg8R2(SEQ ID NO: 1635) TGTCACCATATTTAATCCTCCCAA H61775seg8 (SEQ ID NO: 1636)GAAGGCTCTTGTCACTTACTAGCCATGTGATTTTGGAAAGAAACTTAACATTAATTCCTTCAGCTACAATGGAATTCTTGGGAGGATTAAATATGGTGAC AThe results are demonstrated in FIG. 8, showing expression ofimmunoglobulin superfamily, member 9, H61775 transcripts, which aredetectable by amplicon as depicted in sequence name H61775seg8 (SEQ IDNO: 1636), in different normal tissues.

Description for Cluster M85491

Cluster M85491 features 2 transcript(s) and 11 segment(s) of interest,the names for which are given in Tables 20 and 21, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 22.

TABLE 20 Transcripts of interest Transcript Name Sequence ID No.M85491_PEA_1_T16 3 M85491_PEA_1_T20 4

TABLE 21 Segments of interest Segment Name Sequence ID No.M85491_PEA_1_node_0 157 M85491_PEA_1_node_13 158 M85491_PEA_1_node_21159 M85491_PEA_1_node_23 160 M85491_PEA_1_node_24 161M85491_PEA_1_node_8 162 M85491_PEA_1_node_9 163 M85491_PEA_1_node_10 164M85491_PEA_1_node_18 165 M85491_PEA_1_node_19 166 M85491_PEA_1_node_6167

TABLE 22 Proteins of interest Protein Name Sequence ID No.M85491_PEA_1_P13 1283 M85491_PEA_1_P14 1284

These sequences are variants of the known protein Ephrin type-B receptor2 [precursor] (SwissProt accession identifier EPB2_HUMAN; known alsoaccording to the synonyms EC 2.7.1.112; Tyrosine-protein kinase receptorEPH-3; DRT; Receptor protein-tyrosine kinase HEK5; ERK), SEQ ID NO:1417, referred to herein as the previously known protein.

Protein Ephrin type-B receptor 2 [precursor] (SEQ ID NO:1417) is knownor believed to have the following function(s): Receptor for members ofthe ephrin-B family. The sequence for protein Ephrin type-B receptor 2[precursor] is given at the end of the application, as “Ephrin type-Breceptor 2 [precursor] amino acid sequence” (SEQ ID NO:1417). Knownpolymorphisms for this sequence are as shown in Table 23.

TABLE 23 Amino acid mutations for Known Protein SNPposition(s) on aminoacid sequence Comment 671 A -> R. /FTId = VAR_004162.  1-20MALRRLGAALLLLPLLAAVE -> MWVPVLALPVCTYA 923 E -> K 956 L -> V 958 V -> L154 G -> D 476 K -> KQ 495-496 Missing 532 E -> D 568 R -> RR 589 M -> I788 I -> F 853 S -> A

Protein Ephrin type-B receptor 2 [precursor] (SEQ ID NO:1417)localization is believed to be Type I membrane protein.

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: protein amino acidphosphorylation; transmembrane receptor protein tyrosine kinasesignaling pathway; neurogenesis, which are annotation(s) related toBiological Process; protein tyrosine kinase; receptor;transmembrane-ephrin receptor; ATP binding; transferase, which areannotation(s) related to Molecular Function; and integral membraneprotein, which are annotation(s) related to Cellular Component.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

Cluster M85491 can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 9 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 9 and Table 24. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions:epithelial malignant tumors and a mixture of malignant tumors fromdifferent tissues.

TABLE 24 Normal tissue distribution Name of Tissue Number Bladder 0 Bone0 Brain 10 Colon 31 epithelial 10 General 12 Kidney 0 Liver 0 Lung 5Breast 8 Muscle 5 Ovary 36 pancreas 10 Skin 0 Stomach 0

TABLE 25 P values and ratios for expression in cancerous tissue Name ofTissue P1 P2 SP1 R3 SP2 R4 Bladder 5.4e−01 6.0e−01 3.2e−01 2.5 4.6e−011.9 Bone 1 2.8e−01 1 1.0 7.0e−01 1.8 Brain 3.4e−01 3.6e−01 1.2e−01 2.91.8e−02 2.7 Colon 3.4e−02 5.7e−02 8.2e−02 2.8 2.0e−01 2.1 epithelial1.7e−03 3.5e−03 2.0e−03 2.8 1.1e−02 2.2 General 4.8e−04 5.2e−04 6.7e−042.3 1.3e−03 1.9 Kidney 4.3e−01 3.7e−01 1 1.1 7.0e−01 1.5 Liver 1 4.5e−011 1.0 6.9e−01 1.5 Lung 2.2e−01 2.7e−01 6.9e−02 3.6 3.4e−02 3.6 Breast8.2e−01 7.3e−01 6.9e−01 1.2 6.8e−01 1.2 Muscle 9.2e−01 4.8e−01 1 0.81.5e−01 3.2 Ovary 8.5e−01 7.3e−01 9.0e−01 0.7 6.7e−01 1.0 pancreas5.5e−01 2.0e−01 6.7e−01 1.2 3.5e−01 1.8 Skin 2.9e−01 4.7e−01 1.4e−01 7.06.4e−01 1.6 Stomach 1.5e−01 3.2e−01 1 1.0 8.0e−01 1.3As noted above, cluster M85491 features 2 transcript(s), which werelisted in Table 20 above. These transcript(s) encode for protein(s)which are variant(s) of protein Ephrin type-B receptor 2 [precursor](SEQ ID NO:1417). A description of each variant protein according to thepresent invention is now provided.

Variant protein M85491_PEA_(—)1_P13 (SEQ ID NO:1283) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) M85491_PEA_(—)1_T16 (SEQ IDNO:3). An alignment is given to the known protein (Ephrin type-Breceptor 2 [precursor] (SEQ ID NO:1417)) at the end of the application.One or more alignments to one or more previously published proteinsequences are given at the end of the application. A brief descriptionof the relationship of the variant protein according to the presentinvention to each such aligned protein is as follows:

Comparison report between M85491_PEA_(—)1_P13 (SEQ ID NO:1283) andEPB2_HUMAN (SEQ ID NO:1417):

1. An isolated chimeric polypeptide encoding for M85491_PEA_(—)1_P13(SEQ ID NO:1283), comprising a first amino acid sequence being at least90% homologous toMALRRLGAALLLLPLLAAVEETLMDSTTATAELGWMVHPPSGWEEVSGYDENMNTIRTYQVCNVFESSQNNWLRTKFIRRRGAHRIHVEMKFSVRDCSSIPSVPGSCKETFNLYYYEADFDSATKTFPNWMENPWVKVDTIAADESFSQVDLGGRVMKINTEVRSFGPVSRSGFYLAFQDYGGCMSLIAVRVFYRKCPRIIQNGAIFQETLSGAESTSLVAARGSCIANAEEVDVPIKLYCNGDGEWLVPIGRCMCKAGFEAVENGTVCRGCPSGTFKANQGDEACTHCPINSRTTSEGATNCVCRNGYYRADLDPLDMPCTTIPSAPQAVISSVNETSLMLEWTPPRDSGGREDLVYNIICKSCGSGRGACTRCGDNVQYAPRQLGLTEPRIYISDLLAHTQYTFEIQAVNGVTDQSPFSPQFASVNITTNQAAPSAVSIMHQVSRTVDSITLSWSQPDQPNGVILDYELQYYEK corresponding to amino acids 1-476 of EPB2_HUMAN(SEQ ID NO:1417), which also corresponds to amino acids 1-476 ofM85491_PEA_(—)1_P13 (SEQ ID NO:1283), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence VPIGWVLSPSPTSLRAPLPG (SEQ ID NO:1755) corresponding to amino acids 477-496 of M85491_PEA_(—)1_P13 (SEQID NO:1283), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of M85491_PEA_(—)1_P13(SEQ ID NO:1283), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence VPIGWVLSPSPTSLRAPLPG (SEQ ID NO: 1755) inM85491_PEA_(—)1_P13 (SEQ ID NO:1283).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein M85491_PEA_(—)1_P13 (SEQ ID NO:1283) is encoded by thefollowing transcript(s): M85491_PEA_(—)1_T16 (SEQ ID NO:3), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript M85491_PEA_(—)1_T16 (SEQ ID NO:3) is shown inbold; this coding portion starts at position 143 and ends at position1630. The transcript also has the following SNPs as listed in Table 26(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinM85491_PEA_(—)1_P13 (SEQ ID NO:1283) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 26 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 799 G −> A Yes 1066 C −> TYes 1519 A −> G Yes 1872 C −> T Yes 2044 T −> C Yes 2156 G −> A Yes 2606C −> A Yes 2637 G −> C Yes

Variant protein M85491_PEA_(—)1_P14 (SEQ ID NO:1284) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) M85491_PEA_(—)1_T20 (SEQ IDNO:4). An alignment is given to the known protein (Ephrin type-Breceptor 2 [precursor] (SEQ ID NO:1417)) at the end of the application.One or more alignments to one or more previously published proteinsequences are given at the end of the application. A brief descriptionof the relationship of the variant protein according to the presentinvention to each such aligned protein is as follows:

Comparison report between M85491_PEA_(—)1_P14 (SEQ ID NO:1284) andEPB2_HUMAN (SEQ ID NO:1417):

1. An isolated chimeric polypeptide encoding for M85491_PEA_(—)1_P14(SEQ ID NO:1284), comprising a first amino acid sequence being at least90% homologous toMALRRLGAALLLLPLLAAVEETLMDSTTATAELGWMVHPPSGWEEVSGYDENMNTIRTYQVCNVFESSQNNWLRTKFIRRRGAHRIHVEMKFSVRDCSSIPSVPGSCKETFNLYYYEADFDSATKTFPNWMENPWVKVDTIAADESFSQVDLGGRVMKINTEVRSFGPVSRSGFYLAFQDYGGCMSLIAVRVFYRKCPRIIQNGAIFQETLSGAESTSLVAARGSCIANAEEVDVPIKLYCNGDGEWLVPIGRCMCKAGFEAVENGTVCR corresponding to aminoacids 1-270 of EPB2_HUMAN (SEQ ID NO:1417), which also corresponds toamino acids 1-270 of M85491_PEA_(—)1_P14 (SEQ ID NO:1284), and a secondamino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceERQDLTMLSRLVLNSWPQMILPPQPPKVLEL (SEQ ID NO: 1756) corresponding to aminoacids 271-301 of M85491_PEA_(—)1_P14 (SEQ ID NO:1284), wherein saidfirst and second amino acid sequences are contiguous and in a sequentialorder.

2. An isolated polypeptide encoding for a tail of M85491_PEA_(—)1_P14(SEQ ID NO:1284), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence ERQDLTMLSRLVLNSWPQMILPPQPPKVLEL (SEQ ID NO:1756) in M85491_PEA_(—)1_P14 (SEQ ID NO:1284).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein M85491_PEA_(—)1_P14 (SEQ ID NO:1284) is encoded by thefollowing transcript(s): M85491_PEA_(—)1_T20 (SEQ ID NO:4), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript M85491_PEA_(—)1_T20 (SEQ ID NO:4) is shown inbold; this coding portion starts at position 143 and ends at position1045. The transcript also has the following SNPs as listed in Table 27(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinM85491_PEA_(—)1_P14 (SEQ ID NO:1284) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 27 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 799 G −> A Yes 1135 T −> CYes 1160 T −> C Yes 1172 A −> C Yes 1176 T −> A Yes

As noted above, cluster M85491 features 11 segment(s), which were listedin Table 21 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster M85491_PEA_(—)1_node_(—)0 (SEQ ID NO:1028) according tothe present invention is supported by 5 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M85491_PEA_(—)1_T16 (SEQ ID NO:3)and M85491_PEA_(—)1_T20 (SEQ ID NO:4). Table 28 below describes thestarting and ending position of this segment on each transcript.

TABLE 28 Segment location on transcripts Segment Segment starting endingTranscript name position position M85491_PEA_1_T16 (SEQ ID NO: 3) 1 203M85491_PEA_1_T20 (SEQ ID NO: 4) 1 203

Segment cluster M85491_PEA_(—)1_node_(—)13 (SEQ ID NO:1029) according tothe present invention is supported by 6 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M85491PEA_(—)1_T20 (SEQ ID NO:4).Table 29 below describes the starting and ending position of thissegment on each transcript.

TABLE 29 Segment location on transcripts Segment Segment starting endingTranscript name position position M85491_PEA_1_T20 (SEQ ID NO: 4) 9541182

Segment cluster M85491_PEA_(—)1_node_(—)21 (SEQ ID NO:1030) according tothe present invention is supported by 18 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M85491_PEA_(—)1_T16 (SEQ ID NO:3).Table 30 below describes the starting and ending position of thissegment on each transcript.

TABLE 30 Segment location on transcripts Segment Segment starting endingTranscript name position position M85491_PEA_1_T16 (SEQ ID NO: 3) 11101445

Segment cluster M85491_PEA_(—)1_node_(—)23 (SEQ ID NO:1031) according tothe present invention is supported by 18 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M85491_PEA_(—)1_T16 (SEQ ID NO:3).Table 31 below describes the starting and ending position of thissegment on each transcript.

TABLE 31 Segment location on transcripts Segment Segment starting endingTranscript name position position M85491_PEA_1_T16 (SEQ ID NO: 3) 14461570

Segment cluster M85491_PEA_(—)1_node_(—)24 (SEQ ID NO:1032) according tothe present invention is supported by 3 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M85491_PEA_(—)1_T16 (SEQ ID NO:3).Table 32 below describes the starting and ending position of thissegment on each transcript.

TABLE 32 Segment location on transcripts Segment Segment starting endingTranscript name position position M85491_PEA_1_T16 (SEQ ID NO: 3) 15712875

Segment cluster M85491_PEA_(—)1_node_(—)8 (SEQ ID NO:1033) according tothe present invention is supported by 25 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M85491_PEA_(—)1_T16 (SEQ ID NO:3)and M85491_PEA_(—)1_T20 (SEQ ID NO:4). Table 33 below describes thestarting and ending position of this segment on each transcript.

TABLE 33 Segment location on transcripts Segment Segment starting endingTranscript name position position M85491_PEA_1_T16 (SEQ ID NO: 3) 269672 M85491_PEA_1_T20 (SEQ ID NO: 4) 269 672

Microarray (chip) data is also available for this segment as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotides were found to hit this segment (in relation to lungcancer), shown in Table 34.

TABLE 34 Oligonucleotides related to this segment Overexpressed ChipOligonucleotide name in cancers reference M85491_0_14_0 (SEQ ID NO: 206)lung malignant LUN tumors

Segment cluster M85491_PEA_(—)1_node_(—)9 (SEQ ID NO:1034) according tothe present invention is supported by 20 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M85491_PEA_(—)1_T16 (SEQ ID NO:3)and M85491_PEA_(—)1_T20 (SEQ ID NO:4). Table 35 below describes thestarting and ending position of this segment on each transcript.

TABLE 35 Segment location on transcripts Segment Segment starting endingTranscript name position position M85491_PEA_1_T16 (SEQ ID NO: 3) 673856 M85491_PEA_1_T20 (SEQ ID NO: 4) 673 856

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 bp in length, and so are included in a separatedescription.

Segment cluster M85491_PEA_(—)1_node_(—)10 (SEQ ID NO:1035) according tothe present invention is supported by 17 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M85491_PEA_(—)1_T16 (SEQ ID NO:3)and M85491_PEA_(—)1_T20 (SEQ ID NO:4). Table 36 below describes thestarting and ending position of this segment on each transcript.

TABLE 36 Segment location on transcripts Segment Segment starting endingTranscript name position position M85491_PEA_1_T16 (SEQ ID NO: 3) 857953 M85491_PEA_1_T20 (SEQ ID NO: 4) 857 953

Segment cluster M85491_PEA_(—)1_node_(—)18 (SEQ ID NO:1036) according tothe present invention is supported by 15 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M85491_PEA_(—)1_T16 (SEQ ID NO:3).Table 37 below describes the starting and ending position of thissegment on each transcript.

TABLE 37 Segment location on transcripts Segment Segment starting endingTranscript name position position M85491_PEA_1_T16 (SEQ ID NO: 3) 9541044

Segment cluster M85491_PEA_(—)1_node_(—)18 (SEQ ID NO:1036) according tothe present invention is supported by 15 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M85491_PEA_(—)1_T16 (SEQ ID NO:3).Table 38 below describes the staring and ending position of this segmenton each transcript.

TABLE 38 Segment location on transcripts Segment Segment starting endingTranscript name position position M85491_PEA_1_T16 (SEQ ID NO: 3) 10451109

Segment cluster M85491_PEA_(—)1_node_(—)6 (SEQ ID NO:1038) according tothe present invention is supported by 11 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M85491_PEA_(—)1_T16 (SEQ ID NO:3)and M85491_PEA_(—)1_T20 (SEQ ID NO:4). Table 39 below describes thestarting and ending position of this segment on each transcript.

TABLE 39 Segment location on transcripts Segment Segment starting endingTranscript name position position M85491_PEA_1_T16 (SEQ ID NO: 3) 204268 M85491_PEA_1_T20 (SEQ ID NO: 4) 204 268Variant protein alignment to the previously known protein:

Sequence name: /tmp/qfmsU9VtxS/DylcLC9j8v:EPB2_HUMAN (SEQ ID NO:1417)Sequence documentation: Alignment of: M85491_PEA_1_P13 (SEQ ID NO:1283)x EPB2_HUMAN (SEQ ID NO:1417) Alignment segment 1/1: Quality: 4726.00Escore: 0 Matching length: 476 Total length: 476 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Aligmnment:             .          .         .   .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .

Sequence name: /tmp/rmnzuDbot6/GiHbjeU8iR:EPB2_HUMAN (SEQ ID NO:1417)Sequence documentation: Alignment of: M85491_PEA_1_P14 (SEQ ID NO:1284)x EPB2_HUMAN (SEQ ID NO:1417) .. Alignment segment 1/1: Quality: 2673.00Escore: 0 Matching length: 270 Total length: 270 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .

Expression of Ephrin type-B receptor 2 precursor (EC 2.7.1.112)(Tyrosine-protein kinase receptor EPH-3) M85491 transcripts which aredetectable by amplicon as depicted in sequence name M85491seg24 (SEQ IDNO: 1639) in normal and cancerous lung tissues

Expression of Ephrin type-B receptor 2 precursor (EC 2.7.1.112)(Tyrosine-protein kinase receptor EPH-3) transcripts detectable by oraccording to seg24, M85491seg24 amplicon (SEQ ID NO: 1639) andM85491seg24F (SEQ ID NO: 1637) and M85491seg24R (SEQ ID NO: 1638)primers was measured by real time PCR. In parallel the expression offour housekeeping genes—PBGD (GenBank Accession No. BC019323 (SEQ IDNO:1713); amplicon—PBGD-amplicon, SEQ ID NO:334), HPRT1 (GenBankAccession No. NM_(—)000194 (SEQ ID NO:1714); amplicon—HPRT1-amplicon,SEQ ID NO:1297), Ubiquitin (GenBank Accession No. BC000449 (SEQ IDNO:1711); amplicon—Ubiquitin-amplicon, SEQ ID NO:328) and SDHA (GenBankAccession No. NM_(—)004168 (SEQ ID NO:1712); amplicon—SDHA-amplicon, SEQID NO:331) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of thequantities of the housekeeping genes. The normalized quantity of each RTsample was then divided by the median of the quantities of the normalpost-mortem (PM) samples (Sample Nos. 47-50, 90-93, 96-99, Table 2above, “Tissue samples in testing panel”), to obtain a value of foldup-regulation for each sample relative to median of the normal PMsamples.

FIG. 10 below is a histogram showing over expression of theabove-indicated Ephrin type-B receptor 2 precursor (EC 2.7.1.112)(Tyrosine-protein kinase receptor EPH-3) transcripts in cancerous lungsamples relative to the normal samples. Values represent the average ofduplicate experiments. Error bars indicate the minimal and maximalvalues obtained. The number and percentage of samples that exhibit atleast 3 fold over-expression, out of the total number of samples tested,is indicated in the bottom.

As is evident from FIG. 10, the expression of Ephrin type-B receptor 2precursor (EC 2.7.1.112) (Tyrosine-protein kinase receptor EPH-3)transcripts detectable by the above ampliconin cancer samples wassignificantly higher than in the non-cancerous samples (Sample Nos.47-50, 90-93, 96-99 Table 2, “Tissue samples in testing panel”.).Notably an over-expression of at least 3 fold was found in 9 out of 15adenocarcinoma samples and in 4 out of 8 small cell carcinoma samples.

Statistical analysis was applied to verify the significance of theseresults, as described below.

Threshold of 3 fold overexpression was found to differentiate betweencancer and normal samples with P value of 7.42E-03 in adenocarcinoma and5.69E-02 in small cell carcinoma as checked by exact fisher test. Theabove values demonstrate statistical significance of the results.

Primer pairs are also optionally and preferably encompassed within thepresent invention; for example, for the above experiment, the followingprimer pair was used as a non-limiting illustrative example only of asuitable primer pair: M85491seg24F forward primer (SEQ ID NO: 1637); andM85491seg24Rreverse primer (SEQ ID NO: 1638).

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: M85491seg24 (SEQ IDNO: 1639).

M85491seg24F (SEQ ID NO: 1637)- GGCGTCTTTCTCCCTCTGAAC M85491seg24R (SEQID NO: 1638)- GTCCCATTCTGGGTGCTGTG M85491seg24 (SEQ ID NO: 1639)-GGCGTCTTTCTCCCTCTGAACCTCAGTTrCCACCTGTGTCGAGTGTGGGTGAGACCCCTCGCGGGGAGCTATGCAGGTTACGGAGAAAAGGCAGCACAGC ACCCAGAATGGGACExpression of Ephrin type-B receptor 2 precursor (EC 2.7.1.112)(Tyrosine-protein kinase receptor EPH-3)M85491 transcripts which aredetectable by amplicon as depicted in sequence name M85491 seg24 (SEQ IDNO: 1639) in different normal tissues

Expression of Ephrin type-B receptor 2 precursor transcripts detectableby or according to M85491 seg24 amplicon (SEQ ID NO: 1639) and M85491seg24F (SEQ ID NO: 1637) and M85491 seg24R (SEQ ID NO: 1638) wasmeasured by real time PCR. In parallel the expression of fourhousekeeping genes—RPL19 (GenBank Accession No. NM_(—)000981 (SEQ IDNO:1715); RPL19 amplicon, SEQ ID NO:1630), TATA box (GenBank AccessionNo. NM_(—)003194 (SEQ ID NO:1716); TATA amplicon, SEQ ID NO:1633),Ubiquitin (GenBank Accession No. BC000449 (SEQ ID NO:1711);amplicon—Ubiquitin-amplicon, SEQ ID NO:328) and SDHA (GenBank AccessionNo. NM_(—)004168 (SEQ ID NO:1712); amplicon—SDHA-amplicon, SEQ IDNO:331) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of thequantities of the housekeeping genes. The normalized quantity of each RTsample was then divided by the median of the quantities of the lungsamples (Sample Nos. 15-17, Table 2, “Tissue sample on normal panel”,above), to obtain a value of relative expression of each sample relativeto median of the lung samples.

M85491seg24F (SEQ ID NO: 1637)- GGCGTCTTTCTCCCTCTGAAC M85491seg24R (SEQID NO: 1638)- GTCCCATTCTGGGTGCTGTG M85491seg24 (SEQ ID NO: 1639)-GGCGTCTTTCTCCCTCTGAACCTCAGTTTCCACCTGTGTCGAGTGTGGGTGAGACCCCTCGCGGGGAGCTATGCAGGTTACGGAGAAAAGGCAGCACAGC ACCCAGAATGGGACThe results are shown in FIG. 11, demonstrating the expression of Ephrintype-B receptor 2 precursor (Tyrosine-protein kinase receptor EPH-3)M85491 transcripts which are detectable by amplicon as depicted insequence name M85491seg24 (SEQ ID NO: 1639) in different normal tissues.

Description for Cluster T39971

Cluster T39971 features 4 transcript(s) and 28 segment(s) of interest,the names for which are given in Tables 40 and 41, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 42.

TABLE 40 Transcripts of interest Transcript Name Sequence ID No.T39971_T10 5 T39971_T12 6 T39971_T16 7 T39971_T5 8

TABLE 41 Segments of interest Segment Name Sequence ID No. T39971_node_0168 T39971_node_18 169 T39971_node_21 170 T39971_node_22 171T39971_node_23 172 T39971_node_31 173 T39971_node_33 174 T39971_node_7175 T39971_node_1 176 T39971_node_10 177 T39971_node_11 178T39971_node_12 179 T39971_node_15 180 T39971_node_16 181 T39971_node_17182 T39971_node_26 183 T39971_node_27 184 T39971_node_28 185T39971_node_29 186 T39971_node_3 187 T39971_node_30 188 T39971_node_34189 T39971_node_35 190 T39971_node_36 191 T39971_node_4 192T39971_node_5 193 T39971_node_8 194 T39971_node_9 195

TABLE 42 Proteins of interest Protein Name Sequence ID No. T39971_P61285 T39971_P9 1286 T39971_P11 1287 T39971_P12 1288

These sequences are variants of the known protein Vitronectin precursor(SwissProt accession identifier VTNC_HUMAN; known also according to thesynonyms Serum spreading factor; S-protein; V75), SEQ ID NO: 1418,referred to herein as the previously known protein.

Protein Vitronectin precursor (SEQ ID NO:1418) is known or believed tohave the following function(s): Vitronectin is a cell adhesion andspreading factor found in serum and tissues. Vitronectin interacts withglycosaminoglycans and proteoglycans. Is recognized by certain membersof the integrin family and serves as a cell-to-substrate adhesionmolecule. Inhibitor of the membrane-damaging effect of the terminalcytolytic complement pathway. The sequence for protein Vitronectinprecursor is given at the end of the application, as “Vitronectinprecursor amino acid sequence”. Known polymorphisms for this sequenceare as shown in Table 4.

TABLE 43 Amino acid mutations for Known Protein SNP position(s) on aminoacid sequence Comment 122 A −> S./FTId = VAR_012983. 268 R −> Q./FTId =VAR_012984. 400 T −> M./FTId = VAR_012985. 50 C −> N 225 S −> N 366 A −>T

Protein Vitronectin precursor (SEQ ID NO:1418) localization is believedto be Extracellular.

The previously known protein also has the following indication(s) and/orpotential therapeutic use(s): Cancer, melanoma. It has been investigatedfor clinical/therapeutic use in humans, for example as a target for anantibody or small molecule, and/or as a direct therapeutic; availableinformation related to these investigations is as follows. Potentialpharmaceutically related or therapeutically related activity oractivities of the previously known protein are as follows: Alphavbeta3integrin antagonist; Apoptosis agonist. A therapeutic role for a proteinrepresented by the cluster has been predicted. The cluster was assignedthis field because there was information in the drug database or thepublic databases (e.g., described herein above) that this protein, orpart thereof, is used or can be used for a potential therapeuticindication: Anticancer.

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: immune response; cell adhesion,which are annotation(s) related to Biological Process; protein binding;heparin binding, which are annotation(s) related to Molecular Function;and extracellular space, which are annotation(s) related to CellularComponent.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

Cluster T39971 can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 12 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 12 and Table 44. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions:liver cancer, lung malignant tumors and pancreas carcinoma.

TABLE 44 Normal tissue distribution Name of Tissue Number adrenal 60bladder 0 Bone 0 Brain 9 Colon 0 epithelial 79 general 29 Liver 2164Lung 0 Lymph nodes 0 Breast 0 pancreas 0 prostate 0 Skin 0 Uterus 0

TABLE 45 P values and ratios for expression in cancerous tissue Name ofTissue P1 P2 SP1 R3 SP2 R4 adrenal 6.9e−01 7.4e−01 2.0e−02 2.3 5.3e−021.8 bladder 5.4e−01 6.0e−01 5.6e−01 1.8 6.8e−01 1.5 Bone 1 6.7e−01 1 1.07.0e−01 1.4 Brain 8.0e−01 8.6e−01 3.0e−01 1.9 5.3e−01 1.2 Colon 4.2e−014.8e−01 7.0e−01 1.6 7.7e−01 1.4 epithelial 6.6e−01 5.7e−01 1.0e−01 0.88.7e−01 0.6 general 5.1e−01 3.8e−01 9.2e−08 1.6 8.3e−04 1.3 Liver 16.7e−01 2.3e−03 0.3 1 0.2 Lung 2.4e−01 9.1e−02 1.7e−01 4.3 8.1e−03 5.0Lymph nodes 1 5.7e−01 1 1.0 5.8e−01 2.3 Breast 1 6.7e−01 1 1.0 8.2e−011.2 pancreas 9.5e−02 1.8e−01 1.5e−11 6.5 8.2e−09 4.6 prostate 7.3e−016.0e−01 6.7e−01 1.5 5.6e−01 1.7 Skin 1 4.4e−01 1 1.0 6.4e−01 1.6 Uterus5.0e−01 2.6e−01 1 1.1 8.0e−01 1.4

As noted above, cluster T39971 features 4 transcript(s), which werelisted in Table 40 above. These transcript(s) encode for protein(s)which are variant(s) of protein Vitronectin precursor (SEQ ID NO:1418).A description of each variant protein according to the present inventionis now provided.

Variant protein T39971_P6 (SEQ ID NO:1285) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) T39971_T5 (SEQ ID NO:8). Analignment is given to the known protein (Vitronectin precursor (SEQ IDNO:1418)) at the end of the application. One or more alignments to oneor more previously published protein sequences are given at the end ofthe application. A brief description of the relationship of the variantprotein according to the present invention to each such aligned proteinis as follows:

Comparison report between T39971_P6 (SEQ ID NO:1285) and VTNC_HUMAN (SEQID NO:1418):

1. An isolated chimeric polypeptide encoding for T39971_P6 (SEQ IDNO:1285), comprising a first amino acid sequence being at least 90%homologous toMAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKG corresponding toamino acids 1-276 of VTNC_HUMAN (SEQ ID NO:1418), which also correspondsto amino acids 1-276 of T39971_P6 (SEQ ID NO:1285), and a second aminoacid sequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence TQGVVGD (SEQ ID NO:1757) corresponding to amino acids 277-283 of T39971_P6 (SEQ IDNO:1285), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of T39971_P6 (SEQ IDNO:1285), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence TQGVVGD (SEQ ID NO: 1757) in T39971_P6 (SEQ ID NO:1285).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein T39971_P6 (SEQ ID NO:1285) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 46,(given according to their position(s) on the amino acid sequence, withthe alternative amino acid(s) listed; the last column indicates whetherthe SNP is known or not; the presence of known SNPs in variant proteinT39971_P6 (SEQ ID NO:1285) sequence provides support for the deducedsequence of this variant protein according to the present invention).

TABLE 46 Amino acid mutations SNP position(s) on Alternative Previouslyamino acid sequence amino acid(s) known SNP? 122 A −> S Yes 145 G −> No268 R −> Q Yes 280 V −> A Yes 180 C −> No 180 C −> W No 192 Y −> No 209A −> No 211 T −> No 267 G −> No 267 G −> A No 268 R −> No

Variant protein T39971_P6 (SEQ ID NO:1285) is encoded by the followingtranscript(s): T39971_T5 (SEQ ID NO:8), for which the sequence(s) is/aregiven at the end of the application. The coding portion of transcriptT39971_T5 (SEQ ID NO:8) is shown in bold; this coding portion starts atposition 756 and ends at position 1604. The transcript also has thefollowing SNPs as listed in Table 47 (given according to their positionon the nucleotide sequence, with the alternative nucleic acid listed;the last column indicates whether the SNP is known or not; the presenceof known SNPs in variant protein T39971_P6 (SEQ ID NO:1285) sequenceprovides support for the deduced sequence of this variant proteinaccording to the present invention).

TABLE 47 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 417 G −> C Yes 459 T −> CYes 1387 C −> No 1406 −> A No 1406 −> G No 1555 G −> No 1555 G −> C No1558 G −> No 1558 G −> A Yes 1594 T −> C Yes 1642 T −> C Yes 1770 C −> TYes 529 G −> T Yes 1982 A −> G No 2007 G −> No 2029 T −> C No 2094 T −>C No 2117 C −> G No 2123 C −> T Yes 2152 C −> T Yes 2182 G −> T No 2185A −> C No 2297 T −> C Yes 1119 G −> T Yes 2411 G −> No 2411 G −> T No2487 T −> C Yes 1188 G −> No 1295 C −> No 1295 C −> G No 1324 −> T No1331 C −> No 1381 C −> No

Variant protein T39971_P9 (SEQ ID NO:1286) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) T39971_T10 (SEQ ID NO:5). Analignment is given to the known protein (Vitronectin precursor (SEQ IDNO:1418)) at the end of the application. One or more alignments to oneor more previously published protein sequences are given at the end ofthe application. A brief description of the relationship of the variantprotein according to the present invention to each such aligned proteinis as follows:

Comparison report between T39971_P9 (SEQ ID NO:1286) and VTNC_HUMAN (SEQID NO:1418):

1. An isolated chimeric polypeptide encoding for T39971_P9 (SEQ IDNO:1286), comprising a first amino acid sequence being at least 90%homologous toMAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEECEGSSLSAVFEHFAMMQRDSWEDIFELLFWGRT corresponding to amino acids 1-325 ofVTNC_HUMAN (SEQ ID NO:1418), which also corresponds to amino acids 1-325of T39971^(—)P9 (SEQ ID NO:1286), and a second amino acid sequence beingat least 90% homologous toSGMAPRPSLAKKQRFRHRNRKGYRSQRGHSRGRNQNSRRPSRATWLSLFSSEESNLGANNYDDYRMDWLVPATCEPIQSVFFFSGDKYYRVNLRTRRVDTVDPPYPRSIAQYWLGCPAPGHL corresponding to aminoacids 357-478 of VTNC_HUMAN (SEQ ID NO:1418), which also corresponds toamino acids 326-447 of T39971_P9 (SEQ ID NO:1286), wherein said firstand second amino acid sequences are contiguous and in a sequentialorder.

2. An isolated chimeric polypeptide encoding for an edge portion ofT39971_P9 (SEQ ID NO:1286), comprising a polypeptide having a length“n”, wherein n is at least about 10 amino acids in length, optionally atleast about 20 amino acids in length, preferably at least about 30 aminoacids in length, more preferably at least about 40 amino acids in lengthand most preferably at least about 50 amino acids in length, wherein atleast two amino acids comprise TS, having a structure as follows: asequence starting from any of amino acid numbers 325−x to 325; andending at any of amino acid numbers 326+((n−2)−x), in which x variesfrom 0 to n−2.

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein T39971_P9 (SEQ ID NO:1286) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 48,(given according to their position(s) on the amino acid sequence, withthe alternative amino acid(s) listed; the last column indicates whetherthe SNP is known or not; the presence of known SNPs in variant proteinT39971_P9 (SEQ ID NO:1286) sequence provides support for the deducedsequence of this variant protein according to the present invention).

TABLE 48 Amino acid mutations SNP position(s) on Alternative Previouslyamino acid sequence amino acid(s) known SNP? 122 A −> S Yes 145 G −> No268 R −> Q Yes 328 M −> T No 350 S −> P No 369 T −> M Yes 379 S −> I No380 N −> T No 180 C −> No 180 C −> W No 192 Y −> No 209 A −> No 211 T −>No 267 G −> No 267 G −> A No 268 R −> No

Variant protein T39971_P9 (SEQ ID NO:1286) is encoded by the followingtranscript(s): T39971_T10 (SEQ ID NO:5), for which the sequence(s)is/are given at the end of the application. The coding portion oftranscript T39971_T10 (SEQ ID NO:5) is shown in bold; this codingportion starts at position 756 and ends at position 2096. The transcriptalso has the following SNPs as listed in Table 49 (given according totheir position on the nucleotide sequence, with the alternative nucleicacid listed; the last column indicates whether the SNP is known or not;the presence of known SNPs in variant protein T39971_P9 (SEQ ID NO:1286)sequence provides support for the deduced sequence of this variantprotein according to the present invention).

TABLE 49 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 417 G −> C Yes 459 T −> CYes 1387 C −> No 1406 −> A No 1406 −> G No 1555 G −> No 1555 G −> C No1558 G −> No 1558 G −> A Yes 1738 T −> C No 1803 T −> C No 1826 C −> GNo 529 G −> T Yes 1832 C −> T Yes 1861 C −> T Yes 1891 G −> T No 1894 A−> C No 2006 T −> C Yes 2120 G −> No 2120 G −> T No 2196 T −> C Yes 1119G −> T Yes 1188 G −> No 1295 C −> No 1295 C −> G No 1324 −> T No 1331 C−> No 1381 C −> No

Variant protein T39971_P11 (SEQ ID NO:1287) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) T39971_T12 (SEQ ID NO:6). Analignment is given to the known protein (Vitronectin precursor (SEQ IDNO:1418)) at the end of the application. One or more alignments to oneor more previously published protein sequences are given at the end ofthe application. A brief description of the relationship of the variantprotein according to the present invention to each such aligned proteinis as follows:

Comparison report between T39971_P11 (SEQ ID NO:1287) and VTNC_HUMAN(SEQ ID NO:1418):

1. An isolated chimeric polypeptide encoding for T39971_P11 (SEQ IDNO:1287), comprising a first amino acid sequence being at least 90%homologous toMAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEECEGSSLSAVFEHFAMMQRDSWEDIFELLFWGRTS corresponding to amino acids 1-326 ofVTNC_HUMAN (SEQ ID NO:1418), which also corresponds to amino acids 1-326of T39971_P11 (SEQ ID NO:1287), and a second amino acid sequence beingat least 90% homologous to DKYYRVNLRTRRVDTVDPPYPRSIAQYWLGCPAPGHLcorresponding to amino acids 442-478 of VTNC_HUMAN (SEQ ID NO:1418),which also corresponds to amino acids 327-363 of T39971_P11 (SEQ IDNO:1287), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

2. An isolated chimeric polypeptide encoding for an edge portion ofT39971_P11 (SEQ ID NO:1287), comprising a polypeptide having a length“n”, wherein n is at least about 10 amino acids in length, optionally atleast about 20 amino acids in length, preferably at least about 30 aminoacids in length, more preferably at least about 40 amino acids in lengthand most preferably at least about 50 amino acids in length, wherein atleast two amino acids comprise SD, having a structure as follows: asequence starting from any of amino acid numbers 326−x to 326; andending at any of amino acid numbers 327+((n−2)−x), in which x variesfrom 0 to n−2.

Comparison report between T39971_P11 (SEQ ID NO:1287) and Q9BSH7 (SEQ IDNO:1696):

1. An isolated chimeric polypeptide encoding for T39971_P11 (SEQ IDNO:1287), comprising a first amino acid sequence being at least 90%homologous toMAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFKGSQYWRFEDGVLDPDYPRNISDGFDGIPDNVDAALALPAHSYSGRERVYFFKGKQYWEYQFQHQPSQEECEGSSLSAVFEHFAMMQRDSWEDIFELLFWGRTS corresponding to amino acids 1-326 of Q9BSH7,which also corresponds to amino acids 1-326 of T39971_P11 (SEQ IDNO:1287), and a second amino acid sequence being at least 90% homologousto DKYYRVNLRTRRVDTVDPPYPRSIAQYWLGCPAPGHL corresponding to amino acids442-478 of Q9BSH7, which also corresponds to amino acids 327-363 ofT39971_P11 (SEQ ID NO:1287), wherein said first and second amino acidsequences are contiguous and in a sequential order.

2. An isolated chimeric polypeptide encoding for an edge portion ofT39971_P11 (SEQ ID NO:1287), comprising a polypeptide having a length“n”, wherein n is at least about 10 amino acids in length, optionally atleast about 20 amino acids in length, preferably at least about 30 aminoacids in length, more preferably at least about 40 amino acids in lengthand most preferably at least about 50 amino acids in length, wherein atleast two amino acids comprise SD, having a structure as follows: asequence starting from any of amino acid numbers 326−x to 326; andending at any of amino acid numbers 327 +((n−2)−x), in which x variesfrom 0 to n−2.

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein T39971_P11 (SEQ ID NO:1287) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 50,(given according to their position(s) on the amino acid sequence, withthe alternative amino acid(s) listed; the last column indicates whetherthe SNP is known or not; the presence of known SNPs in variant proteinT39971_P11 (SEQ ID NO:1287) sequence provides support for the deducedsequence of this variant protein according to the present invention).

TABLE 50 Amino acid mutations SNP position(s) on Alternative Previouslyamino acid sequence amino acid(s) known SNP? 122 A −> S Yes 145 G −> No268 R −> Q Yes 180 C −> No 180 C −> W No 192 Y −> No 209 A −> No 211 T−> No 267 G −> No 267 G −> A No 268 R −> No

Variant protein T39971_P11 (SEQ ID NO:1287) is encoded by the followingtranscript(s): T39971_T12 (SEQ ID NO:6), for which the sequence(s)is/are given at the end of the application. The coding portion oftranscript T39971_T12 (SEQ ID NO:6) is shown in bold; this codingportion starts at position 756 and ends at position 1844. The transcriptalso has the following SNPs as listed in Table 51 (given according totheir position on the nucleotide sequence, with the alternative nucleicacid listed; the last column indicates whether the SNP is known or not;the presence of known SNPs in variant protein T39971_P11 (SEQ IDNO:1287) sequence provides support for the deduced sequence of thisvariant protein according to the present invention).

TABLE 51 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 417 G −> C Yes 459 T −> CYes 1387 C −> No 1406 −> A No 1406 −> G No 1555 G −> No 1555 G −> C No1558 G −> No 1558 G −> A Yes 1754 T −> C Yes 1868 G −> No 1868 G −> T No529 G −> T Yes 1944 T −> C Yes 1119 G −> T Yes 1188 G −> No 1295 C −> No1295 C −> G No 1324 −> T No 1331 C −> No 1381 C −> No

Variant protein T39971_P12 (SEQ ID NO:1288) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) T39971_T16 (SEQ ID NO:7). Analignment is given to the known protein (Vitronectin precursor (SEQ IDNO:1418)) at the end of the application. One or more alignments to oneor more previously published protein sequences are given at the end ofthe application. A brief description of the relationship of the variantprotein according to the present invention to each such aligned proteinis as follows:

Comparison report between T39971_P12 (SEQ ID NO:1288) and VTNC_HUMAN(SEQ ID NO:1418):

1. An isolated chimeric polypeptide encoding for T39971_P12 (SEQ IDNO:1288), comprising a first amino acid sequence being at least 90%homologous toMAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFKcorresponding to amino acids 1-223 of VTNC_HUMAN (SEQ ID NO:1418), whichalso corresponds to amino acids 1-223 of T39971_P12 (SEQ ID NO:1288),and a second amino acid sequence being at least 70%, optionally at least80%, preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceVPGAVGQGRKHLGRV (SEQ ID NO: 1758) corresponding to amino acids 224-238of T39971_P12 (SEQ ID NO:1288), wherein said first and second amino acidsequences are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of T39971_P12 (SEQ IDNO:1288), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence VPGAVGQGRKHLGRV (SEQ ID NO: 1758) in T39971_P12 (SEQ IDNO:1288).

Comparison report between T39971_P12 (SEQ ID NO:1288) and Q9BSH7:

1. An isolated chimeric polypeptide encoding for T39971_P12 (SEQ IDNO:1288), comprising a first amino acid sequence being at least 90homologous toMAPLRPLLILALLAWVALADQESCKGRCTEGFNVDKKCQCDELCSYYQSCCTDYTAECKPQVTRGDVFTMPEDEYTVYDDGEEKNNATVHEQVGGPSLTSDLQAQSKGNPEQTPVLKPEEEAPAPEVGASKPEGIDSRPETLHPGRPQPPAEEELCSGKPFDAFTDLKNGSLFAFRGQYCYELDEKAVRPGYPKLIRDVWGIEGPIDAAFTRINCQGKTYLFKcorresponding to amino acids 1-223 of Q9BSH7, which also corresponds toamino acids 1-223 of T39971_P12 (SEQ ID NO:1288), and a second aminoacid sequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence VPGAVGQGRKHLGRV (SEQ IDNO: 1758) corresponding to amino acids 224-238 of T39971_P12 (SEQ IDNO:1288), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of T39971_P12 (SEQ IDNO:1288), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence VPGAVGQGRKHLGRV (SEQ ID NO: 1758) in T39971_P12 (SEQ IDNO:1288).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein T39971_P12 (SEQ ID NO:1288) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 52,(given according to their position(s) on the amino acid sequence, withthe alternative amino acid(s) listed; the last column indicates whetherthe SNP is known or not; the presence of known SNPs in variant proteinT39971_P12 (SEQ ID NO:1288) sequence provides support for the deducedsequence of this variant protein according to the present invention).

TABLE 52 Amino acid mutations SNP position(s) on Alternative Previouslyamino acid sequence amino acid(s) known SNP? 122 A −> S Yes 145 G −> No180 C −> No 180 C −> W No 192 Y −> No 209 A −> No 211 T −> No

Variant protein T39971_P12 (SEQ ID NO:1288) is encoded by the followingtranscript(s): T39971_T16 (SEQ ID NO:7), for which the sequence(s)is/are given at the end of the application. The coding portion oftranscript T39971_T16 (SEQ ID NO:7) is shown in bold; this codingportion starts at position 756 and ends at position 1469. The transcriptalso has the following SNPs as listed in Table 53 (given according totheir position on the nucleotide sequence, with the alternative nucleicacid listed; the last column indicates whether the SNP is known or not;the presence of known SNPs in variant protein T39971_P12 (SEQ IDNO:1288) sequence provides support for the deduced sequence of thisvariant protein according to the present invention).

TABLE 53 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 417 G −> C Yes 459 T −> CYes 1387 C −> No 1406 −> A No 1406 −> G No 529 G −> T Yes 1119 G −> TYes 1188 G −> No 1295 C −> No 1295 C −> G No 1324 −> T No 1331 C −> No1381 C −> No

As noted above, cluster T39971 features 28 segment(s), which were listedin Table 41 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster T39971_node_(—)0 (SEQ ID NO:1039) according to thepresent invention is supported by 76 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): T39971_(—)10 (SEQ ID NO:5), T39971_T12 (SEQ IDNO:6), T39971_T16 (SEQ ID NO:7) and T39971_T5 (SEQ ID NO:8). Table 54below describes the starting and ending position of this segment on eachtranscript.

TABLE 54 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T10 (SEQ ID NO: 5) 1 810T39971_T12 (SEQ ID NO: 6) 1 810 T39971_T16 (SEQ ID NO: 7) 1 810T39971_T5 (SEQ ID NO: 8) 1 810

Segment cluster T39971_node_(—)18 (SEQ ID NO:1040) according to thepresent invention is supported by 1 libraries The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): T39971_T16 (SEQ ID NO:7). Table 55 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 55 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T16 (SEQ ID NO: 7) 1425 1592

Segment cluster T3997_node_(—)21 (SEQ ID NO:1041) according to thepresent invention is supported by 99 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): T39971_T10 (SEQ ID NO:5), T39971_T12 (SEQ IDNO:6) and T39971_T5 (SEQ ID NO:8). Table 56 below describes the startingand ending position of this segment on each transcript.

TABLE 56 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T10 (SEQ ID NO: 5) 1425 1581T39971_T12 (SEQ ID NO: 6) 1425 1581 T39971_T5 (SEQ ID NO: 8) 1425 1581

Segment cluster T39971_node_(—)22 (SEQ ID NO:1042) according to thepresent invention is supported by 7 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): T39971_T5 (SEQ ID NO:8). Table 57 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 57 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T5 (SEQ ID NO: 8) 1582 1779

Segment cluster T39971_node_(—)23 (SEQ ID NO:1043) according to thepresent invention is supported by 101 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): T39971_T10 (SEQ ID NO:5), T39971_T12 (SEQ IDNO:6) and T39971_T5 (SEQ ID NO:8). Table 56 below describes the startingand ending position of this segment on each transcript.

TABLE 58 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T10 (SEQ ID NO: 5) 1582 1734T39971_T12 (SEQ ID NO: 6) 1582 1734 T39971_T5 (SEQ ID NO: 8) 1780 1932

Segment cluster T39971_node_(—)31 (SEQ ID NO:1044) according to thepresent invention is supported by 94 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): T39971_T10 (SEQ ID NO:5) and T39971_T5 (SEQ IDNO:8). Table 59 below describes the starting and ending position of thissegment on each transcript.

TABLE 59 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T10 (SEQ ID NO: 5) 1847 1986T39971_T5 (SEQ ID NO: 8) 2138 2277

Segment cluster T139971_node_(—)33 (SEQ ID NO:1045) according to thepresent invention is supported by 77 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): T39971_T10 (SEQ ID NO:5), T39971_T12 (SEQ IDNO:6) and T39971_T5 (SEQ ID NO:8). Table 60 below describes the startingand ending position of this segment on each transcript.

TABLE 60 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T10 (SEQ ID NO: 5) 1987 2113T39971_T12 (SEQ ID NO: 6) 1735 1861 T39971_T5 (SEQ ID NO: 8) 2278 2404

Segment cluster T39971_node_(—)7 (SEQ ID NO:1046) according to thepresent invention is supported by 87 libraries The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): T39971_T10 (SEQ ID NO:5), T39971_T12 (SEQ IDNO:6), T39971_T16 (SEQ ID NO:7) and T39971_T5 (SEQ ID NO:8). Table 61below describes the starting and ending position of this segment on eachtranscript.

TABLE 61 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T10 (SEQ ID NO: 5) 940 1162T39971_T12 (SEQ ID NO: 6) 940 1162 T39971_T16 (SEQ ID NO: 7) 940 1162T39971_T5 (SEQ ID NO: 8) 940 1162

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster T39971_node_(—)1 (SEQ ID NO:1047) according to thepresent invention can be found in the following transcript(s):T39971_T10 (SEQ ID NO:5), T39971_T12 (SEQ ID NO:6), T39971_T16 (SEQ IDNO:7) and T39971_T5 (SEQ ID NO:8). Table 62 below describes the startingand ending position of this segment on each transcript.

TABLE 62 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T10 (SEQ ID NO: 5) 811 819T39971_T12 (SEQ ID NO: 6) 811 819 T39971_T16 (SEQ ID NO: 7) 811 819T39971_T5 (SEQ ID NO: 8) 811 819

Segment cluster T39971_node_(—)10 (SEQ ID NO:1048) according to thepresent invention is supported by 77 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): T39971_T10 (SEQ ID NO:5), T39971_T12 (SEQ IDNO:6), T39971_T16 (SEQ ID NO:7) and T39971_T5 (SEQ ID NO:8). Table 63below describes the starting and ending position of this segment on eachtranscript.

TABLE 63 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T10 (SEQ ID NO: 5) 1189 1232T39971_T12 (SEQ ID NO: 6) 1189 1232 T39971_T16 (SEQ ID NO: 7) 1189 1232T39971_T5 (SEQ ID NO: 8) 1189 1232

Segment cluster T39971_node_(—)11 (SEQ ID NO:1049) according to thepresent invention is supported by 79 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): T39971_T10 (SEQ ID NO:5), T39971_T12 (SEQ IDNO:6), T39971_T16 (SEQ ID NO:7) and T39971_T5 (SEQ ID NO:8). Table 64below describes the starting and ending position of this segment on eachtranscript.

TABLE 64 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T10 (SEQ ID NO: 5) 1233 1270T39971_T12 (SEQ ID NO: 6) 1233 1270 T39971_T16 (SEQ ID NO: 7) 1233 1270T39971_T5 (SEQ ID NO: 8) 1233 1270

Segment cluster T39971_node_(—)12 (SEQ ID NO:1050) according to thepresent invention can be found in the following transcript(s):T39971_T10 (SEQ ID NO:5), T39971_T12 (SEQ ID NO:6), T39971_T16 (SEQ IDNO:7) and T39971_T5 (SEQ ID NO:8). Table 65 below describes the startingand ending position of this segment on each transcript.

TABLE 65 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T10 (SEQ ID NO: 5) 1271 1284T39971_T12 (SEQ ID NO: 6) 1271 1284 T39971_T16 (SEQ ID NO: 7) 1271 1284T39971_T5 (SEQ ID NO: 8) 1271 1284

Segment cluster T39971_node_(—)15 (SEQ ID NO:1051) according to thepresent invention is supported by 79 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): T39971_T10 (SEQ ID NO:5), T39971_T12 (SEQ IDNO:6), T39971_T16 (SEQ ID NO:7) and T39971_T5 (SEQ ID NO:8). Table 66below describes the starting and ending position of this segment on eachtranscript.

TABLE 66 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T10 (SEQ ID NO: 5) 1285 1316T39971_T12 (SEQ ID NO: 6) 1285 1316 T39971_T16 (SEQ ID NO: 7) 1285 1316T39971_T5 (SEQ ID NO: 8) 1285 1316

Segment cluster T399771_node_(—)16 (SEQ ID NO:1052) according to thepresent invention can be found in the following transcript(s):T39971_T10 (SEQ ID NO:5), T39971_T12 (SEQ ID NO:6), T39971_T16 (SEQ IDNO:7) and T39971_T5 (SEQ ID NO:8). Table 67 below describes the startingand ending position of this segment on each transcript.

TABLE 67 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T10 (SEQ ID NO: 5) 1317 1340T39971_T12 (SEQ ID NO: 6) 1317 1340 T39971_T16 (SEQ ID NO: 7) 1317 1340T39971_T5 (SEQ ID NO: 8) 1317 1340

Segment cluster T39971_node_(—)17 (SEQ ID NO:1053) according to thepresent invention is supported by 86 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): T39971_T10 (SEQ ID NO:5), T39971_T12 (SEQ IDNO:6), T39971_T16 SEQ ID NO:7) and T39971_T5 (SEQ ID NO:8). Table 68below describes the starting and ending position of this segment on eachtranscript.

TABLE 68 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T10 (SEQ ID NO: 5) 1341 1424T39971_T12 (SEQ ID NO: 6) 1341 1424 T39971_T16 (SEQ ID NO: 7) 1341 1424T39971_T5 (SEQ ID NO: 8) 1341 1424

Segment cluster T39971_node_(—)26 (SEQ ID NO:1054) according to thepresent invention is supported by 85 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): T39971_(—)15 (SEQ ID NO:8). Table 69 belowdescribes the starting and ending position of this segment eachtranscript.

TABLE 69 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T5 (SEQ ID NO: 8) 1933 1974

Segment cluster T39971_node_(—)27 (SEQ ID NO:1055) according to thepresent invention is supported by 90 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): T39971_T5 (SEQ ID NO:8). Table 70 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 70 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T5 (SEQ ID NO: 8) 1975 2025

Segment cluster T39971_node_(—)28 (SEQ ID NO:1056) according to thepresent invention can be found in the following transcript(s):T39971_T10 (SEQ ID NO:5) and T39971_T5 (SEQ ID NO:8). Table 71 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 71 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T10 (SEQ ID NO: 5) 1735 1743T39971_T5 (SEQ ID NO: 8) 2026 2034

Segment cluster T39971_node_(—)29 (SEQ ID NO:1057) according to thepresent invention is supported by 99 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): T39971_T10 (SEQ ID NO:5) and T39971_T5 (SEQ IDNO:8). Table 72 below describes the starting and ending position of thissegment on each transcript.

TABLE 72 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T10 (SEQ ID NO: 5) 1744 1838T39971_T5 (SEQ ID NO: 8) 2035 2129

Segment cluster T39971_node_(—)3 (SEQ ID NO:1058) according to thepresent invention is supported by 78 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): T39971_T10 (SEQ ID NO:5), T39971_T12 (SEQ IDNO:6), T39971_T16 (SEQ ID NO:7) and T39971_T5 (SEQ ID NO:8). Table 73below describes the starting and ending position of this segment on eachtranscript.

TABLE 73 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T10 (SEQ ID NO: 5) 820 861T39971_T12 (SEQ ID NO: 6) 820 861 T39971_T16 (SEQ ID NO: 7) 820 861T39971_T5 (SEQ ID NO: 8) 820 861

Segment cluster T39971_node_(—)30 (SEQ ID NO:1059) according to thepresent invention can be found in the following transcript(s):T39971_T10 (SEQ ID NO:5) and T39971_T5 (SEQ ID NO:8). Table 74 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 74 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T10 (SEQ ID NO: 5) 1839 1846T39971_T5 (SEQ ID NO: 8) 2130 2137

Segment cluster T39971_node_(—)34 (SEQ ID NO:1060) according to thepresent invention can be found in the following transcript(s):T39971_T10 (SEQ ID NO:5), T39971_T12 (SEQ ID NO:6) and T39971_T5 (SEQ IDNO:8). Table 75 below describes the starting and ending position of thissegment on each transcript.

TABLE 75 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T10 (SEQ ID NO: 5) 2114 2120T39971_T12 (SEQ ID NO: 6) 1862 1868 T39971_T5 (SEQ ID NO: 8) 2405 2411

Segment cluster T39971_node_(—)35 (SEQ ID NO:1061) according to thepresent invention can be found in the following transcript(s):T39971_T10 (SEQ ID NO:5), T39971_T12 (SEQ ID NO:6) and T39971_T5 (SEQ IDNO:8). Table 76 below describes the starting and ending position of thissegment on each transcript.

TABLE 76 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T10 (SEQ ID NO: 5) 2121 2137T39971_T12 (SEQ ID NO: 6) 1869 1885 T39971_T5 (SEQ ID NO: 8) 2412 2428

Segment cluster T39971_node_(—)36 (SEQ ID NO:1062) according to thepresent invention is supported by 51 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): T39971_T10 (SEQ ID NO:5), T39971_T12 (SEQ IDNO:6) and T39971_T5 (SEQ ID NO:8). Table 77 below describes the startingand ending position of this segment on each transcript.

TABLE 77 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T10 (SEQ ID NO: 5) 2138 2199T39971_T12 (SEQ ID NO: 6) 1886 1947 T39971_T5 (SEQ ID NO: 8) 2429 2490

Segment cluster T39971_node_(—)4 (SEQ ID NO:1063) according to thepresent invention can be found in the following transcript(s):T39971_T10 (SEQ ID NO:5), T39971_T12 (SEQ ID NO:6), T39971_T16 (SEQ IDNO:7) and T39971_T5 (SEQ ID NO:8). Table 78 below describes the startingand ending position of this segment on each transcript.

TABLE 78 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T10 (SEQ ID NO: 5) 862 881T39971_T12 (SEQ ID NO: 6) 862 881 T39971_T16 (SEQ ID NO: 7) 862 881T39971_T5 (SEQ ID NO: 8) 862 881

Segment cluster T39971_node_(—)5 (SEQ ID NO:1064) according to thepresent invention is supported by 80 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): T39971_T10 (SEQ ID NO:5), T39971_T12 (SEQ IDNO:6), T39971_T16 (SEQ ID NO:7) and T39971_T5 (SEQ ID NO:8). Table 79below describes the starting and ending position of this segment on eachtranscript.

TABLE 79 Segment location on transcripts Segment Segment starting endingTranscript name position position T39971_T10 (SEQ ID NO: 5) 882 939T39971_T12 (SEQ ID NO: 6) 882 939 T39971_T16 (SEQ ID NO: 7) 882 939T39971_T5 (SEQ ID NO: 8) 882 939

Segment cluster T39971_node_(—)8 (SEQ ID NO:1065) according to thepresent invention can be found in the following transcript(s):T39971_T10 (SEQ ID NO:5), T39971_T12 (SEQ ID NO:6), T39971_T16 (SEQ IDNO:7) and T39971_T5 (SEQ ID NO:8). Table 80 below describes the startingand ending position of this segment on each transcript.

TABLE 80 Segment location on transcripts Segment Segment Transcript namestarting position ending position T39971_T10 (SEQ ID NO: 5) 1163 1168T39971_T12 (SEQ ID NO: 6) 1163 1168 T39971_T16 (SEQ ID NO: 7) 1163 1168T39971_T5 (SEQ ID NO: 8) 1163 1168

Segment cluster T39971_node_(—)9 (SEQ ID NO:1066) according to thepresent invention can be found in the following transcript(s):T39971_T10 (SEQ ID NO:5), T39971_T12 (SEQ ID NO:6), T39971_T16 (SEQ IDNO:7) and T39971_T5 (SEQ ID NO:8). Table 81 below describes the startingand ending position of this segment on each transcript.

TABLE 81 Segment location on transcripts Segment starting SegmentTranscript name position ending position T39971_T10 (SEQ ID NO: 5) 11691188 T39971_T12 (SEQ ID NO: 6) 1169 1188 T39971_T16 (SEQ ID NO: 7) 11691188 T39971_T5 (SEQ ID NO: 8) 1169 1188

Variant protein alignment to the previously known protein:

Sequence name: /tmp/ckraCL2OcZ/43L7YcPH7x:VTNC_HUMAN (SEQ ID NO:1418)Sequence documentation: Alignment of: T39971_P6 (SEQ ID NO:1285) xVTNC_HUMAN (SEQ ID NO:1418) .. Alignment segment 1/1: Quality: 2774.00Escore: 0 Matching length: 278 Total length: 278 Matching percentSimilarity: 99.64 Matching Percent Identity: 99.64 Total PercentSimilarity: 99.64 Total Percent Identity: 99.64 Gaps: 0 Alignment:             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .

Sequence name: /tmp/X4DeeuSlB4/yMubSR5FPs:VTNC_HUMAN (SEQ ID NO:1418)Sequence documentation: Alignment of: T39971_P9 (SEQ ID NO:1286) xVTNC_HUMAN (SEQ ID NO:1418 .. Alignment segment 1/1: Quality: 4430.00Escore: 0 Matching length: 447 Total length: 478 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 93.51 Total Percent Identity: 93.51 Gaps: 1 Alignment:             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .

Sequence name: /tmp/jvp1VtnxNy/wxNSeFVZZw:VTNC_HUMAN (SEQ ID NO:1418)Sequence documentation: Alignment of: T39971_P11 (SEQ ID NO:1287) xVTNC_HUMAN (SEQ ID NO:1418) Alignment segment 1/1: Quality: 3576.00Escore: 0 Matching length: 363 Total length: 478 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 75.94 Total Percent Identity: 75.94 Gaps: 1 Alignment:             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         . 326.................................................. 326 351AGRIYISGMAPRPSLAKKQRFRHRNRKGYRSQRGHSRGRNQNSRRPSRAT 400             .         .         .         .         .

             .         .

Sequence name: /tmp/jvp1VtnxNy/wxNSeFVZZw:Q9BSH7 Sequence documentation:Alignment of: T39971_P11 (SEQ ID NO:1287) x Q9BSH7 .. Alignment segment1/1: Quality: 3576.00 Escore: 0 Matching length: 363 Total length: 478Matching Percent Similarity: 100.00 Metching Percent Identity: 100.00Total Percent Similarity: 75.94 Total Percent Identity: 75.94 Gaps: 1Alignment:              .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         . 326.................................................. 326 351AGRIYISGMAPRPSLAKKQRFRHRNRKGYRSQRGHSRGRNQNSRRPSRAM 400             .         .         .   .         .

             .         .

Sequence name: /tmp/fgebv7ir4i/48bTBMziJ0:VTNC_HUMAN (SEQ ID NO:1418)Sequence documentation: Alignment of: T39971_P12 (SEQ ID NO:1288) xVTNC_HUMAN (SEQ ID NO:1418) Alignment segment 1/1: Quality: 2237.00Escore: 0 Matching length: 223 Total length: 223 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .

Sequence name: /tmp/fgebv7ir4i/48bTBMziJ0:Q9BSH7 Sequence documentation:Alignment of: T39971_P12 (SEQ ID NO:1288) x Q9BSH7 .. Alignment segment1/1: Quality: 2237.00 Escore: 0 Matching length: 223 Totallength: 223Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0Alignment:              .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .

Description for Cluster Z21368

Cluster Z21368 features 7 transcript(s) and 34 segment(s) of interest,the names for which are given in Tables 82 and 83, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 84.

TABLE 82 Transcripts of interest Transcript Name Sequence ID No.Z21368_PEA_1_T10 9 Z21368_PEA_1_T11 10 Z21368_PEA_1_T23 11Z21368_PEA_1_T24 12 Z21368_PEA_1_T5 13 Z21368_PEA_1_T6 14Z21368_PEA_1_T9 15

TABLE 83 Segments of interest Segment Name Sequence ID No.Z21368_PEA_1_node_0 1067 Z21368_PEA_1_node_15 1068 Z21368_PEA_1_node_191069 Z21368_PEA_1_node_2 1070 Z21368_PEA_1_node_21 1071Z21368_PEA_1_node_33 1072 Z21368_PEA_1_node_36 1073 Z21368_PEA_1_node_371074 Z21368_PEA_1_node_39 1075 Z21368_PEA_1_node_4 1076Z21368_PEA_1_node_41 1077 Z21368_PEA_1_node_43 1078 Z21368_PEA_1_node_451079 Z21368_PEA_1_node_53 1080 Z21368_PEA_1_node_56 1081Z21368_PEA_1_node_58 1082 Z21368_PEA_1_node_66 1083 Z21368_PEA_1_node_671084 Z21368_PEA_1_node_69 1085 Z21368_PEA_1_node_11 1086Z21368_PEA_1_node_12 1087 Z21368_PEA_1_node_16 1088 Z21368_PEA_1_node_171089 Z21368_PEA_1_node_23 1090 Z21368_PEA_1_node_24 1091Z21368_PEA_1_node_30 1092 Z21368_PEA_1_node_31 1093 Z21368_PEA_1_node_381094 Z21368_PEA_1_node_47 1095 Z21368_PEA_1_node_49 1096Z21368_PEA_1_node_51 1097 Z21368_PEA_1_node_61 1098 Z21368_PEA_1_node_681099 Z21368_PEA_1_node_7 1100

TABLE 84 Proteins of interest Protein Name Sequence ID No.Z21368_PEA_1_P2 1289 Z21368_PEA_1_P5 1290 Z21368_PEA_1_P15 1291Z21368_PEA_1_P16 1292 Z21368_PEA_1_P22 1293 Z21368_PEA_1_P23 1294

These sequences are variants of the known protein Extracellularsulfatase Sulf-1 precursor (SwissProt accession identifier SUL1_HUMAN;known also according to the synonyms EC 3.1.6.-; HSulf-1), SEQ ID NO:1419, referred to herein as the previously known protein.

Protein Extracellular sulfatase Sulf-1 precursor(SEQ ID NO:1419) isknown or believed to have the following function(s): Exhibitsarylsulfatase activity and highly specific endoglucosamine-6-sulfataseactivity. It can remove sulfate from the C-6 position of glucosaminewithin specific subregions of intact heparin. Diminishes HSPG (heparansulfate proteoglycans) sulfation, inhibits signaling byheparin-dependent growth factors, diminishes proliferation, andfacilitates apoptosis in response to exogenous stimulation. The sequencefor protein Extracellular sulfatase Sulf-1 precursor is given at the endof the application, as “Extracellular sulfatase Sulf-1 precursor aminoacid sequence”. Known polymorphisms for this sequence are as shown inTable 85.

TABLE 85 Amino acid mutations for Known Protein SNP position(s) on aminoacid sequence Comment 87-88 CC->AA: LOSS OF ARYLSULFATASE ACTIVITY ANDLOSS OF ABILITY TO MODULATE APOPTOSIS.  49 L -> P 728 K -> R

Protein Extracellular sulfatase Sulf-1 precursor (SEQ ID NO:1419)localization is believed to be Endoplasmic reticulum and Golgi stack.Also localized on the cell surface (By similarity).

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: apoptosis; metabolism; heparansulfate proteoglycan metabolism, which are annotation(s) related toBiological Process;

arylsulfatase; hydrolase, which are annotation(s) related to MolecularFunction; and extracellular space; endoplasmic reticulum; Golgiapparatus, which are annotation(s) related to Cellular Component.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

Cluster Z21368 can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 13 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 13 and Table 86. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions:epithelial malignant tumors, a mixture of malignant tumors fromdifferent tissues and pancreas carcinoma.

TABLE 86 Normal tissue distribution Name of Tissue Number bladder 123Bone 557 Brain 34 Colon 94 epithelial 56 general 68 head and neck 0kidney 35 Lung 22 Lymph nodes 0 Breast 52 muscle 31 Ovary 0 pancreas 0prostate 44 Skin 67 stomach 109 T cells 0 Thyroid 0 Uterus 140

TABLE 87 P values and ratios for expression in cancerous tissue Name ofTissue P1 P2 SP1 R3 SP2 R4 bladder 5.4e−01 6.6e−01 6.4e−01 1.0 8.5e−010.7 Bone 4.5e−01 8.2e−01 9.1e−01 0.4 1 0.3 Brain 5.5e−01 7.3e−01 1.5e−011.5 5.0e−01 0.9 Colon 1.4e−01 2.8e−01 1.0e−01 2.0 3.0e−01 1.4 epithelial1.1e−03 1.5e−01 1.2e−07 2.1 1.0e−01 1.1 general 1.4e−05 5.3e−02 1.9e−061.6 6.7e−01 0.8 head and neck 2.4e−02 7.1e−02 4.6e−01 2.5 7.5e−01 1.4kidney 8.9e−01 9.0e−01 1 0.4 1 0.4 Lung 3.5e−01 4.1e−01 7.2e−03 2.61.0e−01 1.6 Lymph nodes 7.7e−02 3.1e−01 2.3e−02 8.5 1.9e−01 3.2 Breast4.0e−01 6.1e−01 5.4e−02 2.3 3.0e−01 1.3 muscle 7.5e−02 3.5e−02 1 1.01.7e−01 1.7 Ovary 3.8e−01 4.2e−01 2.2e−01 2.9 3.4e−01 2.2 pancreas2.2e−02 6.9e−02 1.4e−08 6.5 1.4e−06 4.6 prostate 8.3e−01 8.9e−01 3.1e−011.4 5.2e−01 1.1 Skin 6.1e−01 8.1e−01 6.0e−01 1.2 1 0.3 stomach 4.4e−025.0e−01 5.0e−01 0.8 9.7e−01 0.4 T cells 5.0e−01 6.7e−01 3.3e−01 3.17.2e−01 1.4 Thyroid 3.6e−01 3.6e−01 1 1.1 1 1.1 Uterus 3.5e−01 7.8e−014.6e−01 0.9 9.1e−01 0.5

As noted above, cluster Z21368 features 7 transcript(s), which werelisted in Table 82 above. These transcript(s) encode for protein(s)which are variant(s) of protein Extracellular sulfatase Sulf-1 precursor(SEQ ID NO:1419). A description of each variant protein according to thepresent invention is now provided.

Variant protein Z21368_PEA_(—)1_P2 (SEQ ID NO:1289) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) Z21368_PEA_(—)1_T5 (SEQ IDNO:13). An alignment is given to the known protein (Extracellularsulfatase Sulf-1 precursor (SEQ ID NO:1419)) at the end of theapplication. One or more alignments to one or more previously publishedprotein sequences are given at the end of the application. A briefdescription of the relationship of the variant protein according to thepresent invention to each such aligned protein is as follows:

Comparison report between Z21368_PEA_(—)1_P2 (SEQ ID NO:1289) andSUL1_HUMAN (SEQ ID NO:1419):

1. An isolated chimeric polypeptide encoding for Z21368_PEA_(—)1_P2 (SEQID NO:1289), comprising a first amino acid sequence being at least 90%homologous toMKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYVHNHNVYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRTAFFGKYLNEYNGSYIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAKDYFTDLITNESINYFKMSKRMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYNYAPNMDKHWIMQYTGPMLPIHMEFTNILQRKRLQTLMSVDDSVERLYNMLVETGELENTYIIYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVEPGSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNRFRTNKKAKIWRDTFLVERGKFLRKKEESSKNIQQSNHLPKYERVKELCQQARYQTACEQPGQKWQCIEDTSGKLRIHKCKGPSDLLTVRQSTRNLYARGFHDKDKECSCRESGYRASRSQRKSQRQFLRNQGTPKYKPRFVHTRQTRSLSVEFEGEIYDINLEEEEELQVLQPRNIAKRHDEGHKGPRDLQASSGGNRGRMLADSSNAVGPPTTVRVTHKCFILPNDSIHCERELYQSARAWKDHKAYIDKEIEALQDKIKNLREVRGHLKRRKPEECSCSKQSYYNKEKGVKKQEKLKSHLHPFKEAAQEVDSKLQLFKENNRRRKKERKEKRRQRKGEECSLPGLTCFTHDNNHWQTAPFWN corresponding to amino acids 1-761 of SUL1_HUMAN (SEQ ID NO:1419),which also corresponds to amino acids 1-761 of Z21368_PEA_(—)1_P2 (SEQID NO:1289), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence PHKYSAHGRTRHFESATRTTNGAQKLSRI (SEQ ID NO: 1759)corresponding to amino acids 762-790 of Z21368_PEA_(—)1_P2 (SEQ IDNO:1289), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of Z21368_PEA_(—)1_P2(SEQ ID NO:1289), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence PHKYSAHGRTRHFESATRTTNGAQKLSRI (SEQ ID NO:1759) in Z21368_PEA_(—)1_P2 (SEQ ID NO:1289).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein Z21368_PEA_(—)1_P2 (SEQ ID NO:1289) is encoded by thefollowing transcript(s): Z21368_PEA_(—)1_T5 (SEQ ID NO:13), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript Z21368_PEA_(—)1_T5 (SEQ ID NO:13) is shown inbold; this coding portion starts at position 529 and ends at position2898.

Variant protein Z21368_PEA_(—)1_P5 (SEQ ID NO:1290) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) Z21368_PEA_(—)1_T9 (SEQ IDNO:15). An alignment is given to the known protein (Extracellularsulfatase Sulf-1 precursor (SEQ ID NO:1419)) at the end of theapplication. One or more alignments to one or more previously publishedprotein sequences are given at the end of the application. A briefdescription of the relationship of the variant protein according to thepresent invention to each such aligned protein is as follows:

Comparison report between Z21368_PEA_(—)1_P5 (SEQ ID NO:1290) and Q7Z2W2(SEQ ID NO:1697):

1. An isolated chimeric polypeptide encoding for Z21368_PEA_(—)1_P5 (SEQID NO:1290), comprising a first amino acid sequence being at least 90homologous to MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVELcorresponding to amino acids 1-57 of Q7Z2W2 (SEQ ID NO:1697), which alsocorresponds to amino acids 1-57 of Z21368_PEA_(—)1_P5 (SEQ ID NO:1290),second bridging amino acid sequence comprising A, and a third amino acidsequence being at least 90% homologous toFFGKYLNEYNGSYIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAKDYFTDLITNESINYFKMSKRMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYNYAPNMDKHWIMQYTGPMLPIHMEFTNILQRKRLQTLMSVDDSVERLYNMLVETGELENTYIIYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVEPGSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNRFRTNKKAKIWRDTFLVERGKFLRKKEESSKNIQQSNHLPKYERVKELCQQARYQTACEQPGQKWQCIEDTSGKLRIHKCKGPSDLLTVRQSTRNLYARGFHDKDKECSCRESGYRASRSQRKSQRQFLRNQGTPKYKPRFVHTRQTRSLSVEFEGEIYDINLEEEEELQVLQPRNIAKRHDEGHKGPRDLQASSGGNRGRMLADSSNAVGPPTTVRVTHKCFILPNDSIHCERELYQSARAWKDHKAYIDKEIEALQDKIKNLREVRGHLKRRKPEECSCSKQSYYNKEKGVKKQEKLKSHLHPFKEAAQEVDSKLQLFKENNRRRKKERKEKRRQRKGEECSLPGLTCFTHDNNHWQTAPFWNLGSFCACTSSNNNTYWCLRTVNETHNFLFCEFATGFLEYFDMNTDPYQLTNTVHTVERGILNQLHVQLMELRSCQGYKQCNPRPKNLDVGNKDGGSYDLHRGQLWDGWEG corresponding toamino acids 139-871 of Q7Z2W2 (SEQ ID NO:1697), which also correspondsto amino acids 59-791 of Z21368_(—)1_P5 (SEQ ID NO:1290), wherein saidfirst, second and third amino acid sequences are contiguous and in asequential order.

2. An isolated polypeptide encoding for an edge portion ofZ21368_PEA_(—)1_P5 (SEQ ID NO:1290), comprising a polypeptide having alength “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least three amino acids comprise LAF, the sequencehaving a structure as follows (numbering according to Z21368_PEA_(—)1_P5(SEQ ID NO:1290)): a sequence starting from any of amino acid numbers57−x to 57; and ending at any of amino acid numbers 59+((n−2)−x), inwhich x varies from 0 to n−2.

Comparison report between Z21368_PEA_(—)1_P5 (SEQ ID NO:1290) andAAH12997 (SEQ ID NO:1698):

1. An isolated chimeric polypeptide encoding for Z21368_PEA_(—)1_P5 (SEQID NO:1290), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequenceMKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVELAFFGKYLNEYNGSYIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAKDYFTDLITNESINYFKMSKRMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYNYAPNMDKHWIMQYTGPMLPIHMEFTNILQRKRLQTLMSVDDSVERLYNMLVETGELENTYIIYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVEPGSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNRFRTNKKAKIWRDTFLVERGKFLRKKEESSKNIQQSNHLPKYERVKELCQQARYQTACEQPGQKWQCIEDTSGKLRIHKCKGPSDLLTVRQSTRNLYARGEHDKDKECSCRESGYRASRSQRKSQRQFLRNQGTPKYKPRFVHTRQTRSLSVEFEGEIYDINLEEEEELQVLQPRNIAKRHDEGHKGPRDLQASSGGNRGRMLADSSNAVGPPTTVRVTHKCFILPNDSIHCERELYQSARAWKDHKAYIDKEIEALQDKIKNLREVRGHLKRRKPEECSCSKQSYYNKEKGVKKQEKLKSHLHPFKEAAQEVDSKLQLFKENNRRRKKERKEKRRQRKGEECSLPGLTCFTHDNNHWQTAPFWNLGSFCACTSSNNNTYWCLRTVNETHNFLFCEFATGFLEYFDMNTDPYQLTNTVHTVERGILNQLHVQLME(SEQ ID NO: 1760) corresponding to amino acids 1-751 ofZ21368_PEA_(—)1_P5 (SEQ ID NO:1290), and a second amino acid sequencebeing at least 90% homologous toLRSCQGYKQCNPRPKNLDVGNKDGGSYDLHRGQLWDGWEG corresponding to amino acids1-40 of AAH12997 (SEQ ID NO:1698), which also corresponds to amino acids752-791 of Z21368_PEA_(—)1_P5 (SEQ ID NO:1290), wherein said first andsecond amino acid sequences are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a head of Z21368_PEA_(—)1_P5(SEQ ID NO:1290), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequenceMKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVELAFFGKYLNEYNGSYIPPGWREQLGLIKNSRFYNYTVCRNGIKEKHGFDYAKDYFTDLITNESINYFKMSKRMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYNYAPNMDKHWIMQYTGPMLPIHMEFTNILQRKRLQTLMSVDDSVERLYNMLVETGELENTYIIYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVEPGSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNRFRTNKKAKIWRDTFLVERGKFLRKKEESSKNIQQSNHLPKYERVKELCQQARYQTACEQPGQKWQCIEDTSGKLRIHKCKGPSDLLTVRQSTRNLYARGFHDKDGECSCRESGYRASRSQRKSQRQFLRNQGTPKYKPRFVHTRQTRSLSVEFEGEIYDINLEEEEELQVLQPRNIAKRHDEGHKGPRDLQASSGGNRGRMLADSSNAVGPPTVRVTHKCFILPNDSIHCERELYQSARAWKDHKAYIDKEIEALQDKIKNLREVRGHLKRRKPEECSCSKQSYYNKEKGVKKQEKLKSHLHPFKEAAQEVDSKLQLFKENNRRRKKERKEKRRQRKGEECSLPGLTCFTHDNNHWQTAPFWNLGSFCACTSSNNNTYWCLRTVNETHNFLFCEFATGFLEYFDMNTDPYQLTNTVHTVERGILNQLHVQLME(SEQ ID NO: 1760) of Z21368_PEA_(—)1_P5 (SEQ ID NO:1290).

Comparison report between Z21368_PEA_(—)1_P5 (SEQ ID NO:1290) andSUL1_HUMAN (SEQ ID NO:1419):

1. An isolated chimeric polypeptide encoding for Z21368_PEA_(—)1_P5 (SEQID NO:1290), comprising a first amino acid sequence being at least 90homologous to MKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVELcorresponding to amino acids 1-57 of SUL1_HUMAN (SEQ ID NO:1419), whichalso corresponds to amino acids 1-57 of Z21368_PEA_(—)1_P5 (SEQ IDNO:1290), and a second amino acid sequence being at least 90% homologoustoAFFGKYLNEYNGSYIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAKDYFTDLITNESINYFKMSKRMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYNYAPNMDKHWIMQYTGPMLPIHMEFTNILQRKRLQTLMSVDDSVERLYNMLVETGELENTYIIYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVEPGSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNRFRTNKKAKIWRDTFLVERGKFLRKKEESSKNIQQSNHLPKYERVKELCQQARYQTACEQPGQKWQCIEDTSGKLRIHKCKGPSDLLTVRQSTRNLYARGFHDKDKECSCRESGYRASRSQRKSQRQFLRNQGTPKYKPRFVHTRQTRSLSVEFEGEIYDINLEEEEELQVLQPRNIAKRHDEGHKGPRDLQASSGGNRGRMLADSSNAVGPPTTVRVTHKCFILPNDSIHCERELYQSARAWKDHKAYIDKEIEALQDKIKNLREVRGHLKRRKPEECSCSKQSYYNKEKGVKKQEKLKSHLHPFKEAAQEVDSKLQLFKENNRRRKKERKEKRRQRKGEECSLPGLTCFTHDNNHWQTAPFWNLGSFCACTSSNNNTYWCLRTVNETHNFLFCEFATGFLEYFDMNTDPYQLTNTVHTVERGILNQLHVQLMELRSCQGYKQCNPRPKNLDVGNKDGGSYDLHRGQLWDGWEG correspondingto amino acids 138-871 of SUL1_HUMAN (SEQ ID NO:1419), which alsocorresponds to amino acids 58-791 of Z21368_PEA_(—)1_P5 (SEQ IDNO:1290), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

2. An isolated chimeric polypeptide encoding for an edge portion ofZ21368_PEA_(—)1_P5 (SEQ ID NO:1290), comprising a polypeptide having alength “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise LA, having a structureas follows: a sequence starting from any of amino acid numbers 57-x to57; and ending at any of amino acid numbers 58 +((n−2)−x), in which xvaries from 0 to n−2.

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein Z21368_PEA_(—)1_P5 (SEQ ID NO:1290) is encoded by thefollowing transcript(s): Z21368_PEA_(—)1_T9 (SEQ ID NO:15), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript Z21368_PEA_(—)1_T9 (SEQ ID NO:15) is shown inbold; this coding portion starts at position 556 and ends at position2928.

Variant protein Z21368_PEA_(—)1_P15 (SEQ ID NO:1291) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) Z21368_PEA_(—)1_T23 (SEQ IDNO:11). An alignment is given to the known protein (Extracellularsulfatase Sulf-1 precursor (SEQ ID NO:1419)) at the end of theapplication. One or more alignments to one or more previously publishedprotein sequences are given at the end of the application. A briefdescription of the relationship of the variant protein according to thepresent invention to each such aligned protein is as follows:

Comparison report between Z21368_PEA_(—)1_P15 (SEQ ID NO:1291) andSUL1_HUMAN (SEQ ID NO:1419):

1. An isolated chimeric polypeptide encoding for Z21368_PEA1_P15 (SEQ IDNO:1291), comprising a first amino acid sequence being at least 90homologous toMKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYVHNHNVYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRTAFFGKYLNEYNGSYIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAKDYFTDLITNESINYFKMSKRMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYNYAPNMDKHWIMQYTGPMLPIHMEFTNILQRKRLQTLMSVDDSVERLYNMLVETGELENTYIIYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVEPGSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNRFRTNKKAKIWRDTFLVERG corresponding to amino acids 1-416of SUL1_HUMAN (SEQ ID NO:1419), which also corresponds to amino acids1-416 of Z21368_PEA_(—)1_P15 (SEQ ID NO:1291).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein Z21368_PEA_(—)1_P15 (SEQ ID NO:1291) is encoded by thefollowing transcript(s): Z21368_PEA_(—)1_T23 (SEQ ID NO:11), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript Z21368_PEA_(—)1_T23 (SEQ ID NO:11) is shown inbold; this coding portion starts at position 691 and ends at position1938.

Variant protein Z21368_PEA_(—)1_P16 (SEQ ID NO:1292) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) Z21368_PEA_(—)1_T24 (SEQ IDNO:12). An alignment is given to the known protein (Extracellularsulfatase Sulf-1 precursor (SEQ ID NO:1419)) at the end of theapplication. One or more alignments to one or more previously publishedprotein sequences are given at the end of the application. A briefdescription of the relationship of the variant protein according to thepresent invention to each such aligned protein is as follows:

Comparison report between Z21368_PEA_(—)1_P16 (SEQ ID NO:1292) andSUL1_HUMAN (SEQ ID NO:1419):

1. An isolated chimeric polypeptide encoding for Z21368_PEA_(—)1_P16(SEQ ID NO:1292), comprising a first amino acid sequence being at least90% homologous toMKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYVHNHNVYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRTAFFGKYLNEYNGSYIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAKDYFTDLITNESINYFKMSKRMYPHRPVMMVISHAAPHGPEDSAPQFSKLYPNASQHITPSYNYAPNMDKHWIMQYTGPMLPIHMEFTNILQRKRLQTLMSVDDSVERLYNMLVETGELENTYIIYTADHGYHIGQFGLVKGKSMPYDFDIRVPFFIRGPSVEPGSIVPQIVLNIDLAPTILDIAGLDTPPDVDGKSVLKLLDPEKPGNR corresponding to amino acids 1-397 of SUL1_HUMAN (SEQID NO:1419), which also corresponds to amino acids 1-397 ofZ21368_PEA_(—)1_P16 (SEQ ID NO:1292), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence CVIVPPLSQPQIH (SEQ ID NO: 1761)corresponding to amino acids 398-410 of Z21368_PEA_(—)1_P16 (SEQ IDNO:1292), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of Z21368_PEA_(—)1_P16(SEQ ID NO:1292), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence CVIVPPLSQPQIH (SEQ ID NO: 1761) inZ21368_PEA_(—)1_P16 (SEQ ID NO:1292).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein Z21368_PEA_(—)1_P16 (SEQ ID NO:1292) is encoded by thefollowing transcript(s): Z21368_PEA_(—)1_T24 (SEQ ID NO:12), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript Z21368_PEA_(—)1_T24 (SEQ ID NO:12) is shown inbold; this coding portion starts at position 691 and ends at position1920.

Variant protein Z21368_PEA_(—)1_P22 (SEQ ID NO:1293) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) Z21368_PEA_(—)1_T10 (SEQ IDNO:9). An alignment is given to the known protein (Extracellularsulfatase Sulf-1 precursor (SEQ ID NO:1419)) at the end of theapplication. One or more alignments to one or more previously publishedprotein sequences are given at the end of the application. A briefdescription of the relationship of the variant protein according to thepresent invention to each such aligned protein is as follows:

Comparison report between Z21368_PEA_(—)1_P22 (SEQ ID NO:1293) andSUL1_HUMAN (SEQ ID NO:1419):

1. An isolated chimeric polypeptide encoding for Z21368_PEA_(—)1_P22(SEQ ID NO:1293), comprising a first amino acid sequence being at least90% homologous toMKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYVHNHNVYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRTAFFGKYLNEYNGSYIPPGWREWLGLIKNSRFYNYTVCRNGIKEKHGFDYAK corresponding to amino acids1-188 of SUL1_HUMAN (SEQ ID NO:1419), which also corresponds to aminoacids 1-188 of Z21368_PEA_(—)1_P22 (SEQ ID NO:1293), and a second aminoacid sequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence ARYDGDQPRCAPRPRGLSPTVF(SEQ ID NO: 1762) corresponding to amino acids 189-210 ofZ21368_PEA_(—)1_P22 (SEQ ID NO:1293), wherein said first and secondamino acid sequences are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of Z21368_PEA_(—)1_P22(SEQ ID NO:1293), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence ARYDGDQPRCAPRPRGLSPTVF (SEQ ID NO: 1762) inZ21368_PEA_(—)1_P22 (SEQ ID NO:1293).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein Z21368_PEA_(—)1_P22 (SEQ ID NO:1293) is encoded by thefollowing transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript Z21368_PEA_(—)1_T10 (SEQ ID NO:9) is shown inbold; this coding portion starts at position 691 and ends at position1320.

Variant protein Z21368_PEA_(—)1_P23 (SEQ ID NO:1294) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) Z21368_PEA_(—)1_T11 (SEQ IDNO:10). An alignment is given to the known protein (Extracellularsulfatase Sulf-1 precursor (SEQ ID NO:1419)) at the end of theapplication. One or more alignments to one or more previously publishedprotein sequences are given at the end of the application. A briefdescription of the relationship of the variant protein according to thepresent invention to each such aligned protein is as follows:

Comparison report between Z21368_PEA_(—)1_P23 (SEQ ID NO:1294) andQ7Z2W2 (SEQ ID NO:1697):

1. An isolated chimeric polypeptide encoding for Z21368_PEA_(—)1_P23(SEQ ID NO:1294), comprising a first amino ac id sequence being at least90% homologous toMKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYVHNHNVYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRTcorresponding to amino acids 1-137 of Q7Z2W2 (SEQ ID NO:1697), whichalso corresponds to amino acids 1-137 of Z21368_PEA_(—)1_P23 (SEQ IDNO:1294), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence GLLHRLNH (SEQ ID NO: 1763) corresponding to aminoacids 138-145 of Z21368_PEA_(—)1_P23 (SEQ ID NO:1294), wherein saidfirst and second amino acid sequences are contiguous and in a sequentialorder.

2. An isolated polypeptide encoding for a tail of Z21368_PEA_(—)1_P23(SEQ ID NO:1294), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence GLLHRLNH (SEQ ID NO: 1763) inZ21368_PEA_(—)1_P23 (SEQ ID NO:1294).

Comparison report between Z21368_PEA_(—)1_P23 (SEQ ID NO:1294) andSUL1_HUMAN (SEQ ID NO:1419):

1. An isolated chimeric polypeptide encoding for Z21368_PEA_(—)1_P23(SEQ ID NO:1294), comprising a first amino acid sequence being at least90% homologous toMKYSCCALVLAVLGTELLGSLCSTVRSPRFRGRIQQERKNIRPNIILVLTDDQDVELGSLQVMNKTRKIMEHGGATFINAFVTTPMCCPSRSSMLTGKYVHNHNVYTNNENCSSPSWQAMHEPRTFAVYLNNTGYRTcorresponding to amino acids 1-137 of SUL1_HUMAN (SEQ ID NO:1419), whichalso corresponds to amino acids 1-137 of Z21368_PEA_(—)1_P23 (SEQ IDNO:1294), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence GLLHRLNH (SEQ ID NO: 1763) corresponding to aminoacids 138-145 of Z21368_PEA_(—)1_P23 (SEQ ID NO:1294), wherein saidfirst and second amino acid sequences are contiguous and in a sequentialorder.

2. An isolated polypeptide encoding for a tail of Z21368_PEA_(—)1_P23(SEQ ID NO:1294), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence GLLHRLNH (SEQ ID NO: 1763) inZ21368_PEA_(—)1_P23 (SEQ ID NO:1294).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein Z21368_PEA_(—)1_P23 (SEQ ID NO:1294) is encoded by thefollowing transcript(s): Z21368_PEA_(—)1_T11 (SEQ ID NO:10), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript Z21368_PEA_(—)1_T11 (SEQ ID NO:10) is shown inbold; this coding portion starts at position 691 and ends at position1125.

As noted above, cluster Z21368 features 34 segment(s), which were listedin Table 83 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster Z21368_PEA_(—)1_node_(—)0 (SEQ ID NO:1067) according tothe present invention libraries. The number of libraries was determinedas previously described. This segment can be found in the followingtranscript(s): Z21368_PEA_(—)1_T9 (SEQ ID NO:15). Table 88 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 88 Segment location on transcripts Segment Segment Transcript namestarting position ending position Z21368_PEA_1_T9 (SEQ ID NO: 1 327 15)

Segment cluster Z21368_PEA_(—)1_node_(—)15 (SEQ ID NO:1068) according tothe present invention is supported by 26 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T23 (SEQ ID NO:11),Z21368_PEA_(—)1_T24 (SEQ ID NO:12), Z21368_PEA_(—)1_T5 (SEQ ID NO:13),Z21368_PEA_(—)1_T6 (SEQ ID NO:14) and Z21368_PEA_(—)1_T9 (SEQ ID NO:15).Table 89 below describes the starting and ending position of thissegment on each transcript.

TABLE 89 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 631 807 Z21368_PEA_1_T11 (SEQ ID NO: 10) 631 807 Z21368_PEA_1_T23(SEQ ID NO: 11) 631 807 Z21368_PEA_1_T24 (SEQ ID NO: 12) 631 807Z21368_PEA_1_T5 (SEQ ID NO: 13) 469 645 Z21368_PEA_1_T6 (SEQ ID NO: 14)469 645 Z21368_PEA_1_T9 (SEQ ID NO: 15) 496 672

Segment cluster Z21368_PEA_(—)1_node_(—)19 (SEQ ID NO:1069) according tothe present invention is supported by 24 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T23 (SEQ ID NO:14).Table 90 below describes the starting and ending position of thissegment on each transcript.

TABLE 90 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 863 1102 Z21368_PEA_1_T11 (SEQ ID NO: 10) 863 1102 Z21368_PEA_1_T23(SEQ ID NO: 11) 863 1102 Z21368_PEA_1_T24 (SEQ ID NO: 12) 863 1102Z21368_PEA_1_T5 (SEQ ID NO: 13) 701 940 Z21368_PEA_1_T6 (SEQ ID NO: 14)701 940

Segment cluster Z21368_PEA_(—)1_node_(—)2 (SEQ ID NO:1070) according tothe present invention is supported by 15 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T23 (SEQ ID NO:11),Z21368_PEA_(—)1_T24 (SEQ ID NO:12), Z21368_PEA_(—)1_T5 (SEQ ID NO:13)and Z21368_PEA_(—)1_T6 (SEQ ID NO:14). Table 91 below describes thestarting and ending position of this segment on each transcript.

TABLE 91 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 1 300 Z21368_PEA_1_T11 (SEQ ID NO: 10) 1 300 Z21368_PEA_1_T23 (SEQ IDNO: 11) 1 300 Z21368_PEA_1_T24 (SEQ ID NO: 12) 1 300 Z21368_PEA_1_T5(SEQ ID NO: 13) 1 300 Z21368_PEA_1_T6 (SEQ ID NO: 14) 1 300

Segment cluster Z21368_PEA_(—)1_node_(—)21 (SEQ ID NO:1071) according tothe present invention is supported by 37 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T23 (SEQ ID NO:11), Z21368_PEA_(—)1_T24 (SEQ ID NO:12),Z21368_PEA_(—)1_T5 (SEQ ID NO:13), Z21368_PEA_(—)1_T6 (SEQ ID NO:14) andZ21368_PEA_(—)1_T9 (SEQ ID NO:15). Table 92 below describes the startingand ending position of this segment on each transcript.

TABLE 92 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 1103 1254 Z21368_PEA_1_T23 (SEQ ID NO: 11) 1103 1254 Z21368_PEA_1_T24(SEQ ID NO: 12) 1103 1254 Z21368_PEA_1_T5 (SEQ ID NO: 13) 941 1092Z21368_PEA_1_T6 (SEQ ID NO: 14) 941 1092 Z21368_PEA_1_T9 (SEQ ID NO: 15)728 879

Segment cluster Z21368_PEA_(—)1_node_(—)33 (SEQ ID NO:1072) according tothe present invention is supported by 45 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T23 (SEQ ID NO:11),Z21368_PEA_(—)1_T24 (SEQ ID NO:12), Z21368_PEA_(—)1_T5 (SEQ ID NO:13),Z21368_PEA_(—)1_T6 (SEQ ID NO:14) and Z21368_PEA_(—)1_T9 (SEQ ID NO:15).Table 93 below describes the starting and ending position of thissegment on each transcript.

TABLE 93 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 1502 1677 Z21368_PEA_1_T11 (SEQ ID NO: 10) 1424 1599 Z21368_PEA_1_T23(SEQ ID NO: 11) 1576 1751 Z21368_PEA_1_T24 (SEQ ID NO: 12) 1576 1751Z21368_PEA_1_T5 (SEQ ID NO: 13) 1414 1589 Z21368_PEA_1_T6 (SEQ ID NO:14) 1414 1589 Z21368_PEA_1_T9 (SEQ ID NO: 15) 1201 1376

Segment cluster Z21368_PEA_(—)1_node_(—)36 (SEQ ID NO:1073) according tothe present invention is supported by 44 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T23 (SEQ ID NO:11),Z21368_PEA_(—)1_T24 (SEQ ID NO:12), Z21368_PEA_(—)1_T5 (SEQ ID NO:13),Z21368_PEA_(—)1_T6 (SEQ ID NO:14) and Z21368_PEA_(—)1_T9 (SEQ ID NO:15).Table 94 below describes the starting and ending position of thissegment on each transcript.

TABLE 94 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 1678 1806 Z21368_PEA_1_T11 (SEQ ID NO: 10) 1600 1728 Z21368_PEA_1_T23(SEQ ID NO: 11) 1752 1880 Z21368_PEA_1_T24 (SEQ ID NO: 12) 1752 1880Z21368_PEA_1_T5 (SEQ ID NO: 13) 1590 1718 Z21368_PEA_1_T6 (SEQ ID NO:14) 1590 1718 Z21368_PEA_1_T9 (SEQ ID NO: 15) 1377 1505

Segment cluster Z21368_PEA_(—)1_node_(—)37 (SEQ ID NO:1074) according tothe present invention is supported by 3 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T24 (SEQ IDNO:12). Table 95 below describes the starting and ending position ofthis segment on each transcript.

TABLE 95 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T24 (SEQ ID NO:12) 1881 2159

Segment cluster Z21368_PEA_(—)1_node_(—)39 (SEQ ID NO:1075) according tothe present invention is supported by 5 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T23 (SEQ ID NO:11)and Z21368_PEA_(—)1_T24 (SEQ ID NO:12). Table 96 below describes thestarting and ending position of this segment on each transcript.

TABLE 96 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T23 (SEQ ID NO:11) 1938 2790 Z21368_PEA_1_T24 (SEQ ID NO: 12) 2217 3069

Segment cluster Z21368_PEA_(—)1_node_(—)4 (SEQ ID NO:1076) according tothe present invention is supported by 13 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T23 (SEQ ID NO:11) andZ21368_PEA_(—)1_T24 (SEQ ID NO:12). Table 97 below describes thestarting and encoding position of this segment on each transcript.

TABLE 97 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 301 462 Z21368_PEA_1_T11 (SEQ ID NO: 10) 301 462 Z21368_PEA_1_T23(SEQ ID NO: 11) 301 462 Z21368_PEA_1_T24 (SEQ ID NO: 12) 301 462

Segment cluster Z21368_PEA_(—)1_node_(—)41 (SEQ ID NO:1077) according tothe present invention is supported by 49 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T5 (SEQ ID NO:13),Z21368_PEA_(—)1_T6 (SEQ ID NO:14) and Z21368_PEA_(—)1_T9 (SEQ ID NO:15).Table 98 below describes the starting and ending position of thissegment on each transcript.

TABLE 98 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 1864 1993 Z21368_PEA_1_T11 (SEQ ID NO: 10) 1786 1915 Z21368_PEA_1_T5(SEQ ID NO: 13) 1776 1905 Z21368_PEA_1_T6 (SEQ ID NO: 14) 1776 1905Z21368_PEA_1_T9 (SEQ ID NO: 15) 1563 1692

Segment cluster Z21368_PEA_(—)1_node_(—)43 (SEQ ID NO:1078) according tothe present invention is supported by 52 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368 PEA 1 T5 (SEQ ID NO:13),Z21368_PEA_(—)1_T6 (SEQ ID NO:14) and Z21368_PEA_(—)1_T9 (SEQ ID NO:15).Table 99 below describes the starting and ending position of thissegment on each transcript.

TABLE 99 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 1994 2210 Z21368_PEA_1_T11 (SEQ ID NO: 10) 1916 2132 Z21368_PEA_1_T5(SEQ ID NO: 13) 1906 2122 Z21368_PEA_1_T6 (SEQ ID NO: 14) 1906 2122Z21368_PEA_1_T9 (SEQ ID NO: 15) 1693 1909

Segment cluster Z21368_PEA_(—)1_node_(—)45 (SEQ ID NO:1079) according tothe present invention is supported by 64 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T5 (SEQ ID NO:13),Z21368_PEA_(—)1_T6 (SEQ ID NO:14) and Z21368_PEA_(—)1_T9 (SEQ ID NO:15).Table 100 below describes the starting and ending position of thissegment on each transcript.

TABLE 100 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 2211 2466 Z21368_PEA_1_T11 (SEQ ID NO: 10) 2133 2388 Z21368_PEA_1_T5(SEQ ID NO: 13) 2123 2378 Z21368_PEA_1_T6 (SEQ ID NO: 14) 2123 2378Z21368_PEA_1_T9 (SEQ ID NO: 15) 1910 2165

Segment cluster Z21368_PEA_(—)1_node_(—)53 (SEQ ID NO:1080) according tothe present invention is supported by 60 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T5 (SEQ ID NO:13),Z21368_PEA_(—)1_T6 (SEQ ID NO:14) and Z21368_PEA_(—)1_T9 (SEQ ID NO:15).Table 101 below describes the starting and ending position of thissegment on each transcript.

TABLE 101 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 2725 2900 Z21368_PEA_1_T11 (SEQ ID NO: 10) 2647 2822 Z21368_PEA_1_T5(SEQ ID NO: 13) 2637 2812 Z21368_PEA_1_T6 (SEQ ID NO: 14) 2637 2812Z21368_PEA_1_T9 (SEQ ID NO: 15) 2424 2599

Segment cluster Z21368_PEA_(—)1_node_(—)56 (SEQ ID NO:1081) according tothe present invention is supported by 50 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10) and Z21368_PEA_(—)1_T9 (SEQ IDNO:15). Table 102 below describes the starting and ending position ofthis segment on each transcript.

TABLE 102 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 2901 3043 Z21368_PEA_1_T11 (SEQ ID NO: 10) 2823 2965 Z21368_PEA_1_T9(SEQ ID NO: 15) 2600 2742

Segment cluster Z21368_PEA_(—)1_node_(—)58 (SEQ ID NO:1082) according tothe present invention is supported by 71 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T5 (SEQ IDNO:13), Z21368_PEA_(—)1_T6 (SEQ ID NO:14) and Z21368_PEA_(—)1_T9 (SEQ IDNO:15). Table 103 describes the starting and ending position of thissegment on each transcript.

TABLE 103 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 3044 3167 Z21368_PEA_1_T11 (SEQ ID NO: 10) 2966 3089 Z21368_PEA_1_T5(SEQ ID NO: 13) 2813 2936 Z21368_PEA_1_T6 (SEQ ID NO: 14) 2813 2936Z21368_PEA_1_T9 (SEQ ID NO: 15) 2743 2866

Segment cluster Z21368_PEA_(—)1_node_(—)66 (SEQ ID NO:1083) according tothe present invention is supported by 142 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T5 (SEQ ID NO:13),Z21368_PEA_(—)1_T6 (SEQ ID NO:14) and Z21368_PEA_(—)1_T9 (SEQ ID NO:15).Table 104 below describes the starting and ending position of thissegment on each transcript.

TABLE 104 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 3202 3789 Z21368_PEA_1_T11 (SEQ ID NO: 10) 3124 3711 Z21368_PEA_1_T5(SEQ ID NO: 13) 2971 3558 Z21368_PEA_1_T6 (SEQ ID NO: 14) 2971 3558Z21368_PEA_1_T9 (SEQ ID NO: 15) 2901 3488

Segment cluster Z21368_PEA_(—)1_node_(—)67 (SEQ ID NO:1084) according tothe present invention is supported by 181 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T5 (SEQ ID NO:13),Z21368_PEA1_T6 (SEQ ID NO:14) and Z21368_PEA_(—)1_T9 (SEQ ID NO:15).Table 105 below describes the starting and ending position of thissegment on each transcript.

TABLE 105 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 3790 4374 Z21368_PEA_1_T11 (SEQ ID NO: 10) 3712 4296 Z21368_PEA_1_T5(SEQ ID NO: 13) 3559 4143 Z21368_PEA_1_T6 (SEQ ID NO: 14) 3559 4143Z21368_PEA_1_T9 (SEQ ID NO: 15) 3489 4073

Segment cluster Z21368_PEA_(—)1_node_(—)69 (SEQ ID NO:1085) according tothe present invention is supported by 150 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T5 (SEQ ID NO:13),Z21368_PEA_(—)1_T6 (SEQ ID NO:14) and Z21368_PEA_(—)1_T9 (SEQ ID NO:15).Table 106 below describes the starting and ending position of thissegment on each transcript.

TABLE 106 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 4428 4755 Z21368_PEA_1_T11 (SEQ ID NO: 10) 4350 4677 Z21368_PEA_1_T5(SEQ ID NO: 13) 4197 5384 Z21368_PEA_1_T6 (SEQ ID NO: 14) 4197 4524Z21368_PEA_1_T9 (SEQ ID NO: 15) 4127 4454

According to an optional embodiment or the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster Z21368_PEA_(—)1_node_(—)11 (SEQ ID NO:1086) according tothe present invention is supported by 26 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T23 (SEQ ID NO:11),Z21368_PEA_(—)1_T24 (SEQ ID NO:12), Z21368_PEA_(—)1_T5 (SEQ ID NO:13),Z21368_PEA_(—)1_T6 (SEQ ID NO:14) and Z21368_PEA_(—)1_T9 (SEQ ID NO:15).Table 107 below describes the starting and ending position of thissegment on each transcript.

TABLE 107 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 558 602 Z21368_PEA_1_T11 (SEQ ID NO: 10) 558 602 Z21368_PEA_1_T23(SEQ ID NO: 11) 558 602 Z21368_PEA_1_T24 (SEQ ID NO: 12) 558 602Z21368_PEA_1_T5 (SEQ ID NO: 13) 396 440 Z21368_PEA_1_T6 (SEQ ID NO: 14)396 440 Z21368_PEA_1_T9 (SEQ ID NO: 15) 423 467

Segment cluster Z21368_PEA_(—)1_node_(—)12 (SEQ ID NO:1087) according tothe present invention is supported by 23 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T23 (SEQ ID NO:11),Z21368_PEA_(—)1_T24 (SEQ ID NO:12), Z21368_PEA_(—)1_T5 (SEQ ID NO:13),Z21368_PEA_(—)1_T6 (SEQ ID NO:14) and Z21368_PEA_(—)1_T9 (SEQ ID NO:15).Table 108 below describes the starting and ending position of thissegment on each transcript.

TABLE 108 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 603 630 Z21368_PEA_1_T11 (SEQ ID NO: 10) 603 630 Z21368_PEA_1_T23(SEQ ID NO: 11) 603 630 Z21368_PEA_1_T24 (SEQ ID NO: 12) 603 630Z21368_PEA_1_T5 (SEQ ID NO: 13) 441 468 Z21368_PEA_1_T6 (SEQ ID NO: 14)441 468 Z21368_PEA_1_T9 (SEQ ID NO: 15) 468 495

Segment cluster Z21368_PEA_(—)1_node_(—)16 (SEQ ID NO:1088) according tothe present invention can be found in the following transcript(s):Z21368_PEA_(—)1_T10 (SEQ ID NO:9), Z21368_PEA_(—)1_T11 (SEQ ID NO:10),Z21368_PEA_(—)1_T23 (SEQ ID NO:11), Z21368_PEA1_T24 (SEQ ID NO:12),Z21368_PEA_(—)1_T5 (SEQ ID NO:13), Z21368_PEA_(—)1_T6 (SEQ ID NO:14) andZ21368_PEA_(—)1_T9 (SEQ ID NO:15). Table 109 below describes thestarting and ending position of this segment on each transcript.

TABLE 109 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 808 822 Z21368_PEA_1_T11 (SEQ ID NO: 10) 808 822 Z21368_PEA_1_T23(SEQ ID NO: 11) 808 822 Z21368_PEA_1_T24 (SEQ ID NO: 12) 808 822Z21368_PEA_1_T5 (SEQ ID NO: 13) 646 660 Z21368_PEA_1_T6 (SEQ ID NO: 14)646 660 Z21368_PEA_1_T9 (SEQ ID NO: 15) 673 687

Segment cluster Z21368_PEA_(—)1_node_(—)17 (SEQ ID NO:1089) according tothe present invention is supported by 19 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T23 (SEQ ID NO:11),Z21368_PEA_(—)1_T24 (SEQ ID NO:12), Z21368_PEA_(—)1_T5 (SEQ ID NO:13),Z21368_PEA_(—)1_T6 (SEQ ID NO:14) and Z21368_PEA_(—)1_T9 (SEQ ID NO:15).Table 110 below describes the starting and ending position of thissegment on each transcript.

TABLE 110 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 823 862 Z21368_PEA_1_T11 (SEQ ID NO: 10) 823 862 Z21368_PEA_1_T23(SEQ ID NO: 11) 823 862 Z21368_PEA_1_T24 (SEQ ID NO: 12) 823 862Z21368_PEA_1_T5 (SEQ ID NO: 13) 661 700 Z21368_PEA_1_T6 (SEQ ID NO: 14)661 700 Z21368_PEA_1_T9 (SEQ ID NO: 15) 688 727

Segment cluster Z21368_PEA_(—)1_node_(—)23 (SEQ ID NO:1090) according tothe present invention is supported by 36 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T11 (SEQ IDNO:10), Z21368_PEA_(—)1_T23 (SEQ ID NO:11), Z21368_PEA_(—)1_T24 (SEQ IDNO:12), Z21368_PEA_(—)1_T5 (SEQ ID NO:13), Z21368_PEA_(—)1_T6 (SEQ IDNO:14) and Z21368_PEA_(—)1_T9 (SEQ ID NO:15). Table 111 below describesthe starting and ending position of this segment on each transcript.

TABLE 111 Segment location on transcripts Segment Segment startingending Transcript name position position Z21368_PEA_1_T11 (SEQ ID NO:10) 1103 1176 Z21368_PEA_1_T23 (SEQ ID NO: 11) 1255 1328Z21368_PEA_1_T24 (SEQ ID NO: 12) 1255 1328 Z21368_PEA_1_T5 (SEQ ID NO:13) 1093 1166 Z21368_PEA_1_T6 (SEQ ID NO: 14) 1093 1166 Z21368_PEA_1_T9(SEQ ID NO: 15) 880 953

Segment cluster Z21368_PEA_(—)1_node_(—)24 (SEQ ID NO:1091) according tothe present invention is supported by 36 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T23 (SEQ ID NO:11),Z21368_PEA_(—)1_T24 (SEQ ID NO:12), Z21368_PEA_(—)1_T5 (SEQ ID NO:13),Z21368_PEA_(—)1_T6 (SEQ ID NO:14) and Z21368_PEA_(—)1_T9 (SEQ ID NO:15).Table 112 below describes the starting segment on each transcript.

TABLE 112 Segment location on transcripts Segment Segment startingending Transcript name position position Z21368_PEA_1_T10 (SEQ ID NO: 9)1255 1350 Z21368_PEA_1_T11 (SEQ ID NO: 10) 1177 1272 Z21368_PEA_1_T23(SEQ ID NO: 11) 1329 1424 Z21368_PEA_1_T24 (SEQ ID NO: 12) 1329 1424Z21368_PEA_1_T5 (SEQ ID NO: 13) 1167 1262 Z21368_PEA_1_T6 (SEQ ID NO:14) 1167 1262 Z21368_PEA_1_T9 (SEQ ID NO: 15) 954 1049

Segment cluster Z21368_PEA_(—)1_node_(—)30 (SEQ ID NO:1092) according tothe present invention is supported by 39 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T23 (SEQ ID NO:11),Z21368_PEA_(—)1_T24 (SEQ ID NO:12), Z21368_PEA_(—)1_T5 (SEQ ID NO:13),Z21368_PEA_(—)1_T6 (SEQ ID NO:14) and Z21368_PEA_(—)1_T9 (SEQ ID NO:15).Table 113 below describes the starting and ending position of thissegment on each transcript.

TABLE 113 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 1351 1409 Z21368_PEA_1_T11 (SEQ ID NO: 10) 1273 1331 Z21368_PEA_1_T23(SEQ ID NO: 11) 1425 1483 Z21368_PEA_1_T24 (SEQ ID NO: 12) 1425 1483Z21368_PEA_1_T5 (SEQ ID NO: 13) 1263 1321 Z21368_PEA_1_T6 (SEQ ID NO:14) 1263 1321 Z21368_PEA_1_T9 (SEQ ID NO: 15) 1050 1108

Segment cluster Z21368_PEA_(—)1_node_(—)31 (SEQ ID NO:1093) according tothe present invention is supported by 40 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T23 (SEQ ID NO:11),Z21368_PEA_(—)1_T24 (SEQ ID NO:12), Z21368_PEA_(—)1_T5 (SEQ ID NO:13),Z21368_PEA_(—)1_T6 (SEQ ID NO:14) and Z21368_PEA_(—)1_T9 (SEQ ID NO:15).Table 114 below describes the starting and ending position of thissegment on each transcript.

TABLE 114 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 1410 1501 Z21368_PEA_1_T11 (SEQ ID NO: 10) 1332 1423 Z21368_PEA_1_T23(SEQ ID NO: 11) 1484 1575 Z21368_PEA_1_T24 (SEQ ID NO: 12) 1484 1575Z21368_PEA_1_T5 (SEQ ID NO: 13) 1322 1413 Z21368_PEA_1_T6 (SEQ ID NO:14) 1322 1413 Z21368_PEA_1_T9 (SEQ ID NO: 15) 1109 1200

Segment cluster Z21368_PEA_(—)1_node_(—)38 (SEQ ID NO:1094) according tothe present invention is supported by 45 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T23 (SEQ ID NO:11),Z21368_PEA_(—)1_T24 (SEQ ID NO:12), Z21368_PEA_(—)1_T5 (SEQ ID NO:13),Z21368_PEA_(—)1_T6 (SEQ ID NO:14) and Z21368_PEA_(—)1_(—)19 (SEQ IDNO:15). Table 115 below describes the starting and ending position ofthis segment on each transcript.

TABLE 115 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 1807 1863 Z21368_PEA_1_T11 (SEQ ID NO: 10) 1729 1785 Z21368_PEA_1_T23(SEQ ID NO: 11) 1881 1937 Z21368_PEA_1_T24 (SEQ ID NO: 12) 2160 2216Z21368_PEA_1_T5 (SEQ ID NO: 13) 1719 1775 Z21368_PEA_1_T6 (SEQ ID NO:14) 1719 1775 Z21368_PEA_1_T9 (SEQ ID NO: 15) 1506 1562

Segment cluster Z21368_PEA_(—)1_node_(—)47 (SEQ ID NO:1095) according tothe present invention is supported by 61 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T5 (SEQ ID NO:13),Z21368_PEA_(—)1_T6 (SEQ ID NO:14) and Z21368_PEA_(—)1_T9 (SEQ ID NO:15).Table 116 below describes the starting and ending position of thissegment on each transcript.

TABLE 116 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 2467 2563 Z21368_PEA_1_T11 (SEQ ID NO: 10) 2389 2485 Z21368_PEA_1_T5(SEQ ID NO: 13) 2379 2475 Z21368_PEA_1_T6 (SEQ ID NO: 14) 2379 2475Z21368_PEA_1_T9 (SEQ ID NO: 15) 2166 2262

Segment cluster Z21368_PEA_(—)1_node_(—)49 (SEQ ID NO:1096) according tothe present invention is supported by 57 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T5 (SEQ ID NO:13),Z21368_PEA_(—)1_T6 (SEQ ID NO:14) and Z21368_PEA_(—)1_T9 (SEQ ID NO:15).Table 117 below describes the starting and ending position of thissegment on each transcript.

TABLE 117 Segment location on transcripts Segment Segment startingending Transcript name position position Z21368_PEA_1_T10 (SEQ ID NO: 9)2564 2658 Z21368_PEA_1_T11 (SEQ ID NO: 10) 2486 2580 Z21368_PEA_1_T5(SEQ ID NO: 13) 2476 2570 Z21368_PEA_1_T6 (SEQ ID NO: 14) 2476 2570Z21368_PEA_1_T9 (SEQ ID NO: 15) 2263 2357

Segment cluster Z21368_PEA_(—)1_node_(—)51 (SEQ ID NO:1097) according tothe present invention is supported by 46 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T5 (SEQ ID NO:13),Z21368_PEA_(—)1_T6 (SEQ ID NO:14) and Z21368_PEA_(—)1_T9 (SEQ ID NO:15).Table 118 below describes the starting and ending position of thissegment on each transcript.

TABLE 118 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 2659 2724 Z21368_PEA_1_T11 (SEQ ID NO: 10) 2581 2646 Z21368_PEA_1_T5(SEQ ID NO: 13) 2571 2636 Z21368_PEA_1_T6 (SEQ ID NO: 14) 2571 2636Z21368_PEA_1_T9 (SEQ ID NO: 15) 2358 2423

Segment cluster Z21368_PEA_(—)1_node_(—)61 (SEQ ID NO:1098) according tothe present invention is supported by 61 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T5 (SEQ ID NO:13),Z21368_PEA_(—)1_T6 (SEQ ID NO:14) and Z21368_PEA_(—)1_T9 (SEQ ID NO:15).Table 119 below describes the starting and ending position of thissegment on each transcript.

TABLE 119 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 3168 3201 Z21368_PEA_1_T11 (SEQ ID NO: 10) 3090 3123 Z21368_PEA_1_T5(SEQ ID NO: 13) 2937 2970 Z21368_PEA_1_T6 (SEQ ID NO: 14) 2937 2970Z21368_PEA_1_T9 (SEQ ID NO: 15) 2867 2900

Segment cluster Z21368_PEA_(—)1_node_(—)68 (SEQ ID NO:1099) according tothe present invention is supported by 87 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T5 (SEQ ID NO:13),Z21368_PEA_(—)1_T6 (SEQ ID NO:14) and Z21368_PEA_(—)1_T9 (SEQ ID NO:15).Table 120 below describes the starting and ending position of thissegment on each transcript.

TABLE 120 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 4375 4427 Z21368_PEA_1_T11 (SEQ ID NO: 10) 4297 4349 Z21368_PEA_1_T5(SEQ ID NO: 13) 4144 4196 Z21368_PEA_1_T6 (SEQ ID NO: 14) 4144 4196Z21368_PEA_1_T9 (SEQ ID NO: 15) 4074 4126

Segment cluster Z21368_PEA_(—)1_node_(—)7 (SEQ ID NO:1100) according tothe present invention is supported by 29 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z21368_PEA_(—)1_T10 (SEQ ID NO:9),Z21368_PEA_(—)1_T11 (SEQ ID NO:10), Z21368_PEA_(—)1_T23 (SEQ ID NO:11),Z21368_PEA_(—)1_T24 (SEQ ID NO:12), Z21368_PEA_(—)1_T5 (SEQ ID NO:13),Z21368_PEA_(—)1_T6 (SEQ ID NO:14) and Z21368_PEA_(—)1_T9 (SEQ ID NO:15).Table 121 below describes the starting and ending position of thissegment on each transcript.

TABLE 121 Segment location on transcripts Segment Segment endingTranscript name starting position position Z21368_PEA_1_T10 (SEQ ID NO:9) 463 557 Z21368_PEA_1_T11 (SEQ ID NO: 10) 463 557 Z21368_PEA_1_T23(SEQ ID NO: 11) 463 557 Z21368_PEA_1_T24 (SEQ ID NO: 12) 463 557Z21368_PEA_1_T5 (SEQ ID NO: 13) 301 395 Z21368_PEA_1_T6 (SEQ ID NO: 14)301 395 Z21368_PEA_1_T9 (SEQ ID NO: 15) 328 422

Overexpression of at least a portion of this cluster was determinedaccording to oligonucleotides and one or more chips. The results were asfollows: Oligonucleotide Z21368_(—)0_(—)0_(—)61857 (SEQ ID NO: 207) wason the TAA chip and was found to be overexpressed in Lung cancer(general), in Lung adenocarcinoma, and in Lung squamous cell cancer.

Variant protein alignment to the previously known protein:

Sequence name: /tmp/5ER3vIMKE2/9L0Y7lDlTQ:SUL1_HUMAN (SEQ ID NO:1419)Sequence documentation: Alignment of: Z21368_PEA_1_P2 (SEQ ID NO:1289) xSULl_HUMAN (SEQ ID NO:1419) Alignmeent segment 1/1: Quality: 7664.00Escore: 0 Matching length: 761 Total length: 761 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .

Sequence name: /tmp/tt3yfXIUKV/YxSTFWr66h:Q7Z2W2 (SEQ ID NO:1697)Sequence documentation: Alignment of: Z21368_PEA_1_P5 (SEQ ID NO:1290) xQ8Z2W2 (SEQ ID NO:1697) Alignment segment 1/1: Quality: 7869.00 Escore:0 Matching length: 791 Total length: 871 Matching Percent Similarity:99.87 Matching Percent Identity: 99.87 Total Percent Similarity: 90.70Total Percent Identity: 90.70 Gaps: 1 Alignment:             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .

Sequence name: /tmp/tt3yfXIUKV/YxSTFWr66h:AAH12997 (SEQ ID NO:1698)Sequence documentation: Alignment of: Z21368_PEA_1_P5 (SEQ ID NO:1290) xAAH12997 (SEQ ID NO:1698) Alignment segment 1/1: Quality: 420.00 Escore:0 Matching length: 40 Total length: 40 Matching Percent Similarity:100.00 Matching Percent Identity: 100.00 Total Percent Similarity:100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:             .         .         .   .

Sequence name: /tmp/tt3yfXIUKV/YxSTFWr66h:SUL1_HUMAN (SEQ ID NO:1419)Sequence documentation: Alignment of: Z21368_PEA_1_P5 (SEQ ID NO:1290) xSUL1_HUMAN (SEQ ID NO:1419) Alignment segment 1/1: Quality: 7878.00Escore: 0 Matching length: 791 Total length: 871 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 90.82 Total Percent Identity: 90.82 Gaps: 1 Alignment:             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .

Sequence name: /tmp/AVAZGWHuF0/RzHFOnHIsT:SUL1_HUMAN (SEQ ID NO:1419)Sequence documentation: Alignment of: Z21368_PEA_1_P15 (SEQ ID NO:1291)x SUL1_HUMAN (SEQ ID NO:1419) Alignment segment 1/1: Quality: 4174.00Escore: 0 Matching length: 416 Total length: 416 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .   .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .

Sequence name: /tmp/JhwgRdKqmt/kqSmjxkWWk:SUL1_SUMAN (SEQ ID NO:1419)Sequence documentation: Alignment of: Z21368_PEA_1_P16 (SEQ ID NO:1292)x SUL1_HUMAN (SEQ ID NO:1419) Alignment segment 1/1: Quality: 3985.00Escore: 0 Matching length: 397 Total length: 397 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .

Sequence name: /tmp/GPlnIw3BOg/zXFdxqG4ow:SUL1_HUMAN (SEQ ID NO:1419)Sequence documentation: Alignment of: Z21368_PEA_a_P22 (SEQ ID NO:1293)x SUL1_HUMAN (SEQ ID NO:1419) Alignment segment 1/1: Quality: 1897.00Escore: 0 Matching length: 188 Total length: 188 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .

Sequence name: /tmp/oji5Fs74fB/8xeB9KrGjp:Q7Z2W2 (SEQ ID NO:1697)Sequence documentation: Alignment of: Z21368_PEA_1_P23 (SEQ ID NO:1294)x Q7Z2W2 (SEQ ID NO:1697) Alignment segment 1/1: Quality: 1368.00Escore: 0.000511 Matching length: 137 Total length: 137 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:             .         .         .         .         .

             .         .         .         .         .

             .         .         .

Sequence name: /tmp/oji5Fs74fB/8xeB9KrGjp:SUL1_HUMAN (SEQ ID NO:1419)Sequence documentation: Alignment of: Z21368_PEA_1_P23 (SEQ ID NO:1294)x SUL1_HUMAN (SEQ ID NO:1419) Alignment segment 1/1: Quality: 1368.00Escore: 0.000511 Matching length: 137 Total length: 137 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment             .         .         .         .         .

             .         .         .         .         .

             .         .         .

Expression of SUL1_HUMAN—Extracellular sulfatase Sulf-1Z21368transcripts which are detectable by amplicon as depicted in sequencename Z21368junc17-21 (SEQ ID NO: 1642) in normal and cancerous lungtissues

Expression of SUL1_HUMAN—Extracellular sulfatase Sulf-1 transcriptsdetectable by or according to junc17-21 segment, Z21368junc17-21amplicon (SEQ ID NO: 1642) and Z21368junc17-21F (SEQ ID NO: 1640)Z21368junc17-21R (SEQ ID NO: 1641) primers was measured by real timePCR. In parallel the expression of four housekeeping genes—PBGD (GenBankAccession No. BC019323 (SEQ ID NO:1713); amplicon—PBGD-amplicon, SEQ IDNO:334), HPRT1 (GenBank Accession No. NM_(—)000194 (SEQ ID NO:1714);amplicon—HPRT1-amplicon, SEQ ID NO:1297), Ubiquitin (GenBank AccessionNo. BC000449 (SEQ ID NO:1711); amplicon—Ubiquitin-amplicon, SEQ IDNO:328) and SDHA (GenBank Accession No. NM_(—)004168 (SEQ ID NO:1712);amplicon—SDHA-amplicon, SEQ ID NO:331) was measured similarly. For eachRT sample, the expression of the above amplicon was normalized to thegeometric mean of the quantities of the housekeeping genes. Thenormalized quantity of each RT sample was then divided by the median ofthe quantities of the normal post-mortem (PM) samples (Sample Nos.47-50, 90-93, 96-99, Table 2, “Tissue samples in testing panel”, above),to obtain a value of fold up-regulation for each sample relative tomedian of the normal PM samples.

FIG. 14 is a histogram showing over expression of the above-indicatedSUL1_HUMAN—Extracellular sulfatase Sulf-1 transcripts in cancerous lungsamples relative to the normal samples. Values represent the average ofduplicate experiments. Error bars indicate the minimal and maximalvalues obtained. As is evident from FIG. 14, the expression ofSUL1_HUMAN—Extracellular sulfatase Sulf-1 transcripts detectable by theabove amplicon in cancer samples was significantly higher than in thenon-cancerous samples (Sample Nos. 47-50, 90-93, 96-99 Table 2, “Tissuesamples in testing panel”). Notably an over-expression of at least 5fold was found in 10 out of 15 adenocarcinoma samples, 7 out of 16squamous cell carcinoma samples, 0 out of 4 large cell carcinoma samplesand in 0 out of 8 small cells carcinoma samples.

Threshold of 5 fold over-expression was found to differentiate betweencancer and normal samples with P value of 3.56E-04 in adenocarcinoma,9.66E-03 in squamous cell carcinomas checked by exact fisher test. Theabove values demonstrate statistical significance of the results.

Primer pairs are also optionally and preferably encompassed within thepresent invention; for example, for the above experiment, the followingprimer pair was used as a non-limiting illustrative example only of asuitable primer pair: Z21368junc17-21F forward primer (SEQ ID NO: 1640);and Z21368junc17-21 R reverse primer (SEQ ID NO: 1641).

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: Z21368junc17-21 (SEQID NO: 1642).

Forward primer (SEQ ID NO: 1640): GGACGGATACAGCAGGAACG Reverse amplicon(SEQ ID NO: 1641): TATTTTCCAAAAAAGGCCAGCTC Amplicon (SEQ ID NO: 1642):GGACGGATACAGCAGGAACGAAAAAACATCCGACCCAACATTATTCTTGTGCTTACCGATGATCAAGATGTGGAGCTGGCCTTTTTTGGAAAATAExpression of SUL1_HUMAN—Extracellular sulfatase Sulf-1Z21368transcripts, which are detectable by amplicon as depicted in sequencename Z21368 junc17-21 (SEQ ID NO: 1642) in different normal tissues

Expression of SUL1_HUMAN—Extracellular sulfatase Sulf-1 transcriptsdetectable by or according to Z21368 junc17-21 amplicon (SEQ ID NO:1642) and Z21368 junc17-21F (SEQ ID NO: 1640) and Z21368 junc17-21R (SEQID NO: 1641) was measured by real time PCR. In parallel the expressionof four housekeeping genes—RPL19 (GenBank Accession No. NM_(—)000981(SEQ ID NO:1715); RPL19 amplicon, SEQ ID NO:1630), TATA box (GenBankAccession No. NM_(—)003194 (SEQ ID NO:1716); TATA amplicon, SEQ IDNO:1633), Ubiquitin (GenBank Accession No. BC000449 (SEQ ID NO:1711);amplicon—Ubiquitin-amplicon, SEQ ID NO:328) and SDHA (GenBank AccessionNo. NM_(—)004168 (SEQ ID NO:1712); amplicon—SDHA-amplicon, SEQ IDNO:331) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of thequantities of the housekeeping genes. The normalized quantity of each RTsample was then divided by the median of the quantities of the breastsamples (Sample Nos. 33-35 Table 3, “Tissue samples in normal panel”,above), to obtain a value of relative expression of each sample relativeto median of the breast samples.

Forward primer (SEQ ID NO: 1640): GGACGGATACAGCAGGAACG Reverse amplicon(SEQ ID NO: 1641): TATTTTCCAAAAAAGGCCAGCTC Amplicon (SEQ ID NO: 1642):GGACGGATACAGCAGGAACGAAAAAACATCCGACCCAACATTATTCTTGTGCTTACCGATGATCAAGATGTGGAGGTGGCCTTTTTTGGAAAATA

The results are shown in FIG. 15, demonstrating the expression ofExtracellular sulfatase Sulf-1Z21368 transcripts, which are detectableby amplicon as depicted in sequence name Z21368 junc17-21 (SEQ ID NO:1642), in different normal tissues.

Expression of SUL1_HUMAN—Extracellular sulfatase Sulf-1 Z21368transcripts which are detectable by amplicon as depicted in sequencename Z21368seg39 (SEQ ID NO: 1645) in normal and cancerous lung tissues

Expression of SUL1_HUMAN—Extracellular sulfatase Sulf-1 transcriptsdetectable by or according to seg39, Z21368seg39 amplicon (SEQ ID NO:1645) and primers Z21368seg39F (SEQ ID NO: 1643) and Z21368seg39R (SEQID NO: 1644) was measured by real time PCR. In parallel the expressionof four housekeeping genes—PBGD (GenBank Accession No. BC019323 (SEQ IDNO:1713); amplicon—PBGD-amplicon, SEQ ID NO:334), HPRT1 (GenBankAccession No. NM_(—)000194 (SEQ ID NO:1714); amplicon—HPRT1-amplicon,SEQ ID NO:1297), Ubiquitin (GenBank Accession No. BC000449 (SEQ IDNO:1711); amplicon—Ubiquitin-amplicon, SEQ ID NO:328) and SDHA (GenBankAccession No. NM_(—)004168 (SEQ ID NO:1712); amplicon—SDHA-amplicon, SEQID NO:331) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of thequantities of the housekeeping genes. The normalized quantity of each RTsample was then divided by the median of the quantities of the normalpost-mortem (PM) samples (Sample Nos. 47-50, 90-93, 96-99, Table 2,“Tissue samples in testing panel”), to obtain a value of foldup-regulation for each sample relative to median of the normal PMsamples.

FIG. 16 is a histogram showing over expression of the above-indicatedSUL1_HUMAN—Extracellular sulfatase Sulf-1 transcripts in cancerous lungsamples relative to the normal samples. Values represent the average ofduplicate experiments. Error bars indicate the minimal and maximalvalues obtained.

As is evident from FIG. 16, the expression of SUL1_HUMAN—Extracellularsulfatase Sulf-1 transcripts detectable by the above amplicon in cancersamples was higher than in the non-cancerous samples (Sample Nos. 47-50,90-93, 96-99 Table 2, “Tissue samples in testing panel”). Notably anover-expression of at least 5 fold was found in 8 out of 15adenocarcinoma samples, 5 out of 16 squamous cell carcinoma samples and1 out of 4 large cell carcinoma samples.

Statistical analysis was applied to verify the significance of theseresults, as described below.

The P value for the difference in the expression levels ofSUL1_HUMAN—Extracellular sulfatase Sulf-1 transcripts detectable by theabove amplicon in lung cancer samples versus the normal tissue sampleswas determined by T test as 2.17E-04 in adenocarcinoma, 9.94E-03 insquamous cell carcinoma and 2.17E-01 in large cell carcinoma.

Threshold of 5fold overexpression was found to differentiate betweencancer and normal samples with P value of 1.74E-02 in adenocarcinoma,1.58E-01 in squamous cell carcinoma and 4.33E-01 in large cell carcinomaas checked by exact fisher test. The above values demonstratestatistical significance of the results.

Primer pairs are also optionally and preferably encompassed within thepresent invention; for example, for the above experiment, the followingprimer pair was used as a non-limiting illustrative example only of asuitable primer pair: Z21368seg39F forward primer (SEQ ID NO: 1643); andZ21368seg39R reverse primer (SEQ ID NO: 1644).

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: Z21368seg39 (SEQ IDNO: 1645).

Forward primer Z21368seg39F (SEQ ID NO: 1643): GTTGCATTTCTCAGTGCTGGTTTReverse primer Z21368seg39R (SEQ ID NO: 1644): AGGGTGCCGGGTGAGG AmpliconZ21368seg39 (SEQ ID NO: 1645):GTTGCATTTCTCAGTGCTGGTTTCTAATCAGACCAGTGGATTGAGTTTCTCTACCATCCTCCCCACGTTCTTCTCTAAGCTGCCTCCAAGCCTCACCCGG CACCCTExpression of SUL1_HUMAN—Extracellular sulfatase Sulf-1Z21368transcripts which are detectable by amplicon as depicted in sequencename Z21368seg39 (SEQ ID NO: 1645) in different normal tissuesExpression of SUL1_HUMAN—Extracellular sulfatase Sulf-1 transcriptsdetectable by or according to Z21368seg39 amplicon (SEQ ID NO: 1645) andZ21368seg39F (SEQ ID NO: 1643) Z21368seg39R (SEQ ID NO: 1644) wasmeasured by real time PCR. In parallel the expression of fourhousekeeping genes—[RPL19 (GenBank Accession No. NM_(—)000981 (SEQ IDNO:1715); RPL19 amplicon, SEQ ID NO:1630), TATA box (GenBank AccessionNo. NM_(—)003194 (SEQ ID NO:1716); TATA amplicon, SEQ ID NO:1633), UBC(GenBank Accession No. BC000449 (SEQ ID NO:1711);amplicon—Ubiquitin-amplicon, SEQ ID NO:328) and SDHA (GenBank AccessionNo. NM_(—)004168 (SEQ ID NO:1712); amplicon—SDHA-amplicon, SEQ IDNO:331) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of thequantities of the housekeeping genes. The normalized quantity of each RTsample was then divided by the median of the quantities of the breastsamples (Sample Nos. 33-35 Table 3, above), to obtain a value ofrelative expression of each sample relative to median of the breastsamples.

Forward primer Z21368seg39F (SEQ ID NO: 1643): GTTGCATTTCTCAGTGCTGGTTTReverse primer Z21368seg39R (SEQ ID NO: 1644): AGGGTGCCGGGTGAGG AmpliconZ21368seg39 (SEQ ID NO: 1645):GTTGCATTTCTCAGTGCTGGTTTCTAATCAGACCAGTGGATTGAGTTTCTCTACCATCCTCCCCACGTTCTTCTCTAAGCTGCCTCCAAGCCTCACCCGG CACCCT

The results are demonstrated in FIG. 17, showing expression ofSUL1_HUMAN—Extracellular sulfatase Sulf-1, Z21368 transcripts, which aredetectable by amplicon as depicted in sequence name Z21368seg39 (SEQ IDNO: 1645), in different normal tissues.

Expression of SULF1 Z21368 Transcripts which are Detectable by Ampliconas Depicted in Sequence Name Z21368junc59-64F1R1 (SEQ ID NO: 1801) inNormal and Cancerous Lung Tissues

Expression of SULF1 transcripts detectable by or according tojunc59-64—Z21368_junc59-64F1R1 (SEQ ID NO: 1801) amplicon (SEQ ID NO:1801) and primers Z21368_junc59-64F1 (SEQ ID NO: 1799) andZ21368_junc59-64R1 (SEQ ID NO: 1800) was measured by real time PCR. Inparallel the expression of several housekeeping genes—HPRT1 (GenBankAccession No. NM_(—)000194 (SEQ ID NO: 1714); amplicon—HPRT1-amplicon(SEQ ID NO: 1297)), PBGD (GenBank Accession No. BC019323 (SEQ ID NO:1713); amplicon—PBGD-amplicon (SEQ ID NO: 334)), SDHA (GenBank AccessionNo. NM_(—)004168 (SEQ ID NO: 1712); amplicon—SDHA-amplicon (SEQ ID NO:331)) and Ubiquitin (GenBank Accession No. BC000449 (SEQ ID NO: 1711);amplicon—Ubiquitin-amplicon (SEQ ID NO: 328)) was measured similarly.For each RT sample, the expression of the above amplicon was normalizedto the normalization factor calculated from the expression of thesehouse keeping genes as described in normalization method 2 in the“materials and methods” section. The normalized quantity of each RTsample was then divided by the median of the quantities of the normalsamples (sample numbers 51-64, 69 and 70, Table 2_(—)1 above), to obtaina value of fold up-regulation for each sample relative to median of thenormal samples.

FIG. 114 is a histogram showing over expression of the above-indicatedSULF1 transcripts in cancerous Lung samples relative to the normalsamples.

As is evident from FIG. 114, the expression of SULF1 transcriptsdetectable by the above amplicon in non-small cell carcinomasamples—adenocarcinoma, squamous cell carcinoma and large cell carcinomawas significantly higher than in the non-cancerous samples (samplenumbers 51-64, 69 and 70, Table 2_(—)1 above). Notably anover-expression of at least 5 fold was found in 20 out of 57 non-smallcell carcinoma samples—9 out of 23 adenocarcinoma samples, 7 out of 24squamous cell carcinoma samples and 4 out of 10 large cell carcinomasamples.

Statistical analysis was applied to verify the significance of theseresults, as described below.

The P value for the difference in the expression levels of SULF1transcripts detectable by the above amplicon in Lung non-small cellcarcinoma samples versus the normal tissue samples was determined by Ttest as 1.60e-009. The P value for the difference in the expressionlevels of SULF1 transcripts detectable by the above amplicon in Lungadenocarcinoma samples, Lung squamous cell carcinoma samples and Lunglarge cell carcinoma samples versus the normal tissue samples wasdetermined by T test as 1.18e-005, 1.16e-004 and 9.83e-003,respectively.

Threshold of 5 fold over expression was found to differentiate betweennon-small cell carcinoma and normal samples with P value of 2.82e-003 aschecked by exact Fisher test. Threshold of 5 fold over expression wasfound to differentiate between adenocarcinoma and normal samples with Pvalue of 3.86e-003 as checked by exact Fisher test. Threshold of 5 foldover expression was found to differentiate between squamous cellcarcinoma and normal samples with P value of 1.86e-002 as checked byexact Fisher test. Threshold of 5 fold over expression was found todifferentiate between large cell carcinoma and normal samples with Pvalue of 1.40e-002 as checked by exact Fisher test.

The above values demonstrate statistical significance of the results.

Primer pairs are also optionally and preferably encompassed within thepresent invention; for example, for the above experiment, the followingprimer pair was used as a non-limiting illustrative example only of asuitable primer pair: Z21368_junc59-64F1 forward primer (SEQ ID NO:1799); and Z21368_junc59-64R1 reverse primer (SEQ ID NO: 1800).

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: Z21368_junc59-64F1R1(SEQ ID NO: 1801).

Forward Primer (Z21368_unc59-64F1) (SEQ ID NO: 1799):AACAACCGTAGGAGGAAGAAGGA Reverse Primer (Z21368_junc59-64R1) (SEQ ID NO:1800): GTGTGCACTGTATTTGTGAGGGTTC Amplicon (Z21368_junc59-64F1R1) (SEQ IDNO: 1801): AACAACCGTAGGAGGAAGAAGGAGAGGAAGGAGAAGAGACGGCAGAGGAAGGGGGAAGAGTGCAGCCTGCCTGGCCTCAGTTGCTTCACGCATGACAACAACCACTGGCAGACAGCCCCGTTCTGGAACCCTCACAAATACAGTGCACACExpression of SULF1 Z21368 Transcripts which are Detectable by Ampliconas Depicted in Sequence Name Z21368_junc59-64F1R1 (SEQ ID NO: 1801) inDifferent Normal Tissues

Expression of SULF1 transcripts detectable by or according tojunc59-64—Z21368_junc59-64F1R1 amplicon (SEQ ID NO: 1801) and primersZ21368_junc59-64F1 (SEQ ID NO: 1799) and Z21368_junc59-64R1 (SEQ ID NO:1800) was measured by real time PCR. Non-detected samples (sample no.51) were assigned Ct value of 41 and were calculated accordingly. Inparallel the expression of several housekeeping genes—SDHA (GenBankAccession No. NM_(—)004168 (SEQ ID NO: 1712); amplicon—SDHA-amplicon(SEQ ID NO: 331)), Ubiquitin (GenBank Accession No. BC000449 (SEQ ID NO:1711); amplicon—Ubiquitin-amplicon (SEQ ID NO: 328)), RPL19 (GenBankAccession No. NM_(—)000981 (SEQ ID NO: 1715); RPL19 amplicon (SEQ ID NO:1630)) and TATA box (GenBank Accession No. NM_(—)003194 (SEQ ID NO:1716); TATA amplicon (SEQ ID NO: 1633)) was measured similarly. For eachRT sample, the expression of the above amplicon was normalized to thenormalization factor calculated from the expression of these housekeeping genes as described in normalization method 2 in the “materialsand methods” section. The normalized quantity of each RT sample was thendivided by the median of the quantities of the lung samples (samplenumbers 26, 28, 29 and 30, Table 3_(—)1 above), to obtain a value ofrelative expression of each sample relative to median of the lungsamples.

Forward Primer (Z21368_junc59-64F1) (SEQ ID NO: 1799):AACAACCGTAGGAGGAAGAAGGA Reverse Primer (Z21368_junc59-64R1) (SEQ ID NO:1800): GTGTGCACTGTATTTGTGAGGGTTC Amplicon (Z21368_junc59-64F1R1) (SEQ IDNO: 1801): AACAACCGTAGGAGGAAGAAGGAGAGGAAGGAGAAGAGACGGCAGAGGAAGGGGGAAGAGTGCAGCCTGCCTGGCCTCACTTGCTTCACGCATGACAACAACCACTGGCAGACAGCCCCGTTCTGGAACCCTCACAAATACAGTGCACAC

FIG. 115 is a histogram showing the expression of SULF1 Z21368transcripts which are detectable by amplicon as depicted in sequencename Z21368_junc59-64F1R1 (SEQ ID NO: 1801) in different normal tissues.

Description for Cluster HUMGRP5E

Cluster HUMGRP5E features 2 transcript(s) and 5 segment(s) of interest,the names for which are given in Tables 122 and 123, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 124.

TABLE 122 Transcripts of interest Transcript Name Sequence ID No.HUMGRP5E_T4 20 HUMGRP5E_T5 21

TABLE 123 Segments of interest Segment Name Sequence ID No.HUMGRP5E_node_0 335 HUMGRP5E_node_2 336 HUMGRP5E_node_8 337HUMGRP5E_node_3 338 HUMGRP5E_node_7 339

TABLE 124 Proteins of interest Protein Name Sequence ID No. HUMGRP5E_P41299 HUMGRP5E_P5 1300

These sequences are variants of the known protein Gastrin-releasingpeptide precursor (SwissProt accession identifier GRP_HUMAN; known alsoaccording to the synonyms GRP; GRP-10), SEQ ID NO: 1421, referred toherein as the previously known protein. Known isoforms of the GRPprotein are described in sp_vs|P07492-2|GRP_HUMAN Isoform 2 (SEQ ID NO:1788) and sp_vs|P07492-3|GRP_HUMAN Isoform 3 (SEQ ID NO: 1789).

Gastrin-releasing peptide is known or believed to have the followingfunction(s): stimulates gastrin release as well as othergastrointestinal hormones. The sequence for protein Gastrin-releasingpeptide precursor (SEQ ID NO:1421) is given at the end of theapplication, as “Gastrin-releasing peptide precursor amino acidsequence”. Known polymorphisms for this sequence are as shown in Table125.

TABLE 125 Amino acid mutations for Known Protein SNP position(s) onamino acid sequence Comment 4 S -> R

Protein Gastrin-releasing peptide localization is believed to beSecreted.

The previously known protein also has the following indication(s) and/orpotential therapeutic use(s): Diabetes, Type II. It has beeninvestigated for clinical/therapeutic use in humans, for example as atarget for an antibody or small molecule, and/or as a directtherapeutic; available information related to these investigations is asfollows. Potential pharmaceutically related or therapeutically relatedactivity or activities of the previously known protein are as follows:Bombesin antagonist; Insulinotropin agonist. A therapeutic role for aprotein represented by the cluster has been predicted. The cluster wasassigned this field because there was information in the drug databaseor the public databases (e.g., described herein above) that thisprotein, or part thereof, is used or can be used for a potentialtherapeutic indication: Anorectic/Antiobesity; Releasing hormone;Anticancer; Respiratory; Antidiabetic.

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: signal transduction;neuropeptide signaling pathway, which are annotation(s) related toBiological Process; growth factor, which are annotation(s) related toMolecular Function; and secreted, which are annotation(s) related toCellular Component.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

As noted above, cluster HUMGRP5E features 2 transcript(s), which werelisted in Table 122 above. These transcript(s) encode for protein(s)which are variant(s) of protein Gastrin-releasing peptide precursor (SEQID NO:1421). A description of each variant protein according to thepresent invention is now provided.

Variant protein HUMGRP5E_P4 (SEQ ID NO:1299) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) HUMGRP5E_T4 (SEQ ID NO:20).An alignment is given to the known protein (Gastrin-releasing peptideprecursor (SEQ ID NO:1421)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between HUMGRP5E_P4 (SEQ ID NO:1299) and GRP_HUMAN(SEQ ID NO:1421):

1. An isolated chimeric polypeptide encoding for HUMGRP5E_P4 (SEQ IDNO:1299), comprising a first amino acid sequence being at least 90%homologous toMRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLMGKKSTGESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQPKALGNQQPSWDSEDSSNFKDVGSKGK corresponding toamino acids 1-127 of GRP_HUMAN (SEQ ID NO:1421), which also correspondsto amino acids 1-127 of HUMGRP5E_P4 (SEQ ID NO:1299), and a second aminoacid sequence being at least 90% homologous to GSQREGRNPQLNQQcorresponding to amino acids 135-148 of GRP_HUMAN (SEQ ID NO:1421),which also corresponds to amino acids 128-141 of HUMGRP5E_P4 (SEQ IDNO:1299), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

2. An isolated chimeric polypeptide encoding for an edge portion ofHUMGRP5E_P4 (SEQ ID NO:1299), comprising a polypeptide having a length“n”, wherein n is at least about 10 amino acids in length, optionally atleast about 20 amino acids in length, preferably at least about 30 aminoacids in length, more preferably at least about 40 amino acids in lengthand most preferably at least about 50 amino acids in length, wherein atleast two amino acids comprise KG, having a structure as follows: asequence starting from any of amino acid numbers 127-x to 127; andending at any of amino acid numbers 128+((n−2)−x), in which x variesfrom 0 to n−2.

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein HUMGRP5E_P4 (SEQ ID NO:1299) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table126, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein HUMGRP5E_P4 (SEQ ID NO:1299) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 126 Amino acid mutations SNP position(s) on amino Alternativeamino acid sequence acid(s) Previously known SNP? 4 S -> R Yes

Variant protein HUMGRP5E_P4 (SEQ ID NO:1299) is encoded by the followingtranscript(s): HUMGRP5E_T4 (SEQ ID NO:20), for which the sequence(s)is/are given at the end of the application. The coding portion oftranscript HUMGRP5E_T4 (SEQ ID NO:20) is shown in bold; this codingportion starts at position 622 and ends at position 1044. The transcriptalso has the following SNPs as listed in Table 127 (given according totheir position on the nucleotide sequence, with the alternative nucleicacid listed; the last column indicates whether the SNP is known or not;the presence of known SNPs in variant protein HUMGRP5E_P4 (SEQ IDNO:1299) sequence provides support for the deduced sequence of thisvariant protein according to the present invention).

TABLE 127 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 541 -> T No 542 G -> T No631 A -> C Yes 672 G -> A Yes 1340 C -> No 1340 C -> A No 1341 A -> No1341 A -> G No

Variant protein HUMGRP5E_P5 (SEQ ID NO:1300) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) HUMGRP5E_T5 (SEQ ID NO:21).An alignment is given to the known protein (Gastrin-releasing peptideprecursor (SEQ ID NO:1421)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between HUMGRP5E_P5 (SEQ ID NO:1300) and GRP_HUMAN(SEQ ID NO:1421):

1. An isolated chimeric polypeptide encoding for HUMGRP5E_P5 (SEQ IDNO:1300), comprising a first amino acid sequence being at least 90%homologous toMRGSELPLVLLALVLCLAPRGRAVPLPAGGGTVLTKMYPRGNHWAVGHLMGKKSTGESSSVSERGSLKQQLREYIRWEEAARNLLGLIEAKENRNHQPPQPKALGNQQPSWDSEDSSNFKDVGSKGK corresponding toamino acids 1-127 of GRP_HUMAN (SEQ ID NO:1421), which also correspondsto amino acids 1-127 of HUMGRP5E_P5 (SEQ ID NO:1300), and a second aminoacid sequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence DSLLQVLNVKEGTPS (SEQ IDNO: 1764) corresponding to amino acids 128-142 of HUMGRP5E_P5 (SEQ IDNO:1300), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of HUMGRP5E_P5 (SEQ IDNO:1300), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence DSLLQVLNVKEGTPS (SEQ ID NO: 1764) in HUMGRP5E_P5 (SEQ IDNO:1300).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein HUMGRP5E_P5 (SEQ ID NO:1300) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table128, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein HUMGRP5E_P5 (SEQ ID NO:1300) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 128 Amino acid mutations SNP position(s) on amino acid Alternativesequence amino acid(s) Previously known SNP? 4 S -> R Yes

Variant protein HUMGRP5E_P5 (SEQ ID NO:1300) is encoded by the followingtranscript(s): HUMGRP5E_T5 (SEQ ID NO:21), for which the sequence(s)is/are given at the end of the application. The coding portion oftranscript HUMGRP5E_T5 (SEQ ID NO:21) is shown in bold; this codingportion starts at position 622 and ends at position 1047. The transcriptalso has the following SNPs as listed in Table 129 (given according totheir position on the nucleotide sequence, with the alternative nucleicacid listed; the last column indicates whether the SNP is known or not;the presence of known SNPs in variant protein HUMGRP5E_P5 (SEQ IDNO:1300) sequence provides support for the deduced sequence of thisvariant protein according to the present invention).

TABLE 129 Nucleic acid SNPs Previously Alternative known SNP position onnucleotide sequence nucleic acid SNP? 541 -> T No 542 G -> T No 631 A ->C Yes 672 G -> A Yes 1354 C -> No 1354 C -> A No 1355 A -> No 1355 A ->G No

As noted above, cluster HUMGRP5E features 5 segment(s), which werelisted in Table 123 above and for which the sequence(s) are given at theend of the application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster HUMGRP5E_node_(—)0 (SEQ ID NO:1130) according to thepresent invention is supported by 21 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMGRP5E_T4 (SEQ ID NO:20) and HUMGRP5E_T5 (SEQID NO:21). Table 130 below starting and ending position of this segmenton each transcript.

TABLE 130 Segment location on transcripts Segment Segment startingending Transcript name position position HUMGRP5E_T4 (SEQ ID NO: 20) 1760 HUMGRP5E_T5 (SEQ ID NO: 21) 1 760

Segment cluster HUMGRP5E_node_(—)2 (SEQ ID NO:1131) according to thepresent invention is supported by 27 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMGRP5E_T4 (SEQ ID NO:20) and HUMGRP5E_T5 (SEQID NO:21). Table 131 below describes the starting and ending position ofthis segment on each transcript.

TABLE 131 Segment location on transcripts Segment Segment startingending Transcript name position position HUMGRP5E_T4 (SEQ ID NO: 20) 761984 HUMGRP5E_T5 (SEQ ID NO: 21) 761 984

Segment cluster HUMGRP5E_node_(—)8 (SEQ ID NO:1132) according to thepresent invention is supported by 26 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMGRP5E_T4 (SEQ ID NO:20) and HUMGRP5E_T5 (SEQID NO:21). Table 132 below describes the starting and ending position ofthis segment on each transcript.

TABLE 132 Segment location on transcripts Segment Segment startingending Transcript name position position HUMGRP5E_T4 (SEQ ID NO: 20)1004 1362 HUMGRP5E_T5 (SEQ ID NO: 21) 1018 1376

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster HUMGRP5E_node_(—)3 (SEQ ID NO:1133) according to thepresent invention can be found in the following transcript(s):HUMGRP5E_T4 (SEQ ID NO:20) and HUMGRP5E_T5 (SEQ ID NO:21). Table 133below describes the starting and ending position of this segment on eachtranscript.

TABLE 133 Segment location on transcripts Segment Segment startingending Transcript name position position HUMGRP5E_T4 (SEQ ID NO: 20) 9851003 HUMGRP5E_T5 (SEQ ID NO: 21) 985 1003

Segment cluster HUMGRP5E_node_(—)7 (SEQ ID NO:1134) according to thepresent invention can be found in the following transcript(s):HUMGRP5E_T5 (SEQ ID NO:21). Table 134 below describes the starting andending position of this segment on each transcript.

TABLE 134 Segment location on transcripts Segment Segment startingending Transcript name position position HUMGRP5E_T5 (SEQ ID NO: 21)1004 1017

Microarray (chip) data is also available for this gene as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotides were found to hit this segment (with regard to lungcancer), shown in Table 135.

TABLE 135 Oligonucleotides related to this gene Overexpressed ChipOligonucleotide name in cancers reference HUMGRP5E_0_0_16630 (SEQ ID NO:Lung cancer Lung 208) HUMGRP5E_0_2_0 (SEQ ID NO: 209) Lung cancer Lung

Variant protein alignment to the previously known protein:

Sequence name: /tmp/412zs2mwyT/B0wj)UAX0d:GRP_HUMAN (SEQ ID NO:1421)Sequence documentation: Alignmeent of: HUMGRP5E_P4 (SEQ ID NO:1299) xGRP_HUMAN (SEQ ID NO:1421) Alignment segment 1/1: Quality: 1291.00Escore: 0 Matching length: 141 Total length: 148 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 95.27 Total Percent Identity: 95.27 Gaps: 1 Alignment:             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .

Sequence name: /tmp/1me9ldnvfv/KbP5io8PtU:GRP_HUMAN (SEQ ID NO:1421)Sequence documentation: Alignment of: HUMGRP5E_P5 (SEQ ID NO:1300) xGRP_HUMAN (SEQ ID NO:1421) Alignment segment 1/1: Quality: 1248.00Escore: 0 Matching length: 127 Total length: 127 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:             .         .         .         .         .

             .         .         .         .         .

             .         .

The data given below shows that HUMGRP5E splice variants of the presentinvention can be used as useful diagnostic agents for lung cancer. Inparticular, differential overexpression in Small Cell Lung Cancer cells(as opposed to normal lung cells and normal tissue of other types) wasdemonstrated through determination of mRNA expression, while antibodiesselective for HUMGRP5E_P5 (SEQ ID NO: 1300) splice variant were found tobe capable of detecting HUMGRP5E_P5 (SEQ ID NO: 1300) splice variant inhuman serum (blood samples), further confirming the existence ofHUMGRP5E_P5 (SEQ ID NO: 1300) splice variant protein and its specific,differential expression in patients with Cell lung cancer. Antibodiesraised against HUMGRP5E_P5 (SEQ ID NO: 1300) splice variant showed thatHUMGRP5E_P5 (SEQ ID NO: 1300) splice variant is differentially detectedin serum samples taken from subjects suffering from small cell lungcarcinoma as compared to healthy subjects, thereby supporting theutility of HUMGRP5E_P5 (SEQ ID NO: 1300) splice variant as a diagnosticagent for lung cancer. The experiments were performed as described ingreater detail below.

Expression of GRP_HUMAN—Gastrin-Releasing Peptide (HUMGRP5E) Transcriptswhich are Detectable by Amplicon as Depicted in Sequence NameHUMGRP5Ejunc3-7 (SEQ ID NO: 1648) in Normal and Cancerous Lung Tissues

Expression of GRP_HUMAN—gastrin-releasing peptide transcripts detectableby or according to HUMGRP5Ejunc3-7 amplicon (SEQ ID NO: 1648) andHUMGRP5Ejunc3-7F (SEQ ID NO: 1646) and HUMGRP5Ejunc3-7R (SEQ ID NO:1647) primers was measured by real time PCR. In parallel the expressionof four housekeeping genes PBGD (GenBank Accession No. BC019323 (SEQ IDNO:1713); amplicon—PBGD-amplicon, SEQ ID NO:334), HPRT1 (GenBankAccession No. NM_(—)000194 (SEQ ID NO:1714); amplicon—HPRT1-amplicon,SEQ ID NO:1297), Ubiquitin (GenBank Accession No. BC000449 (SEQ IDNO:1711); amplicon—Ubiquitin-amplicon, SEQ ID NO:328) and SDHA (GenBankAccession No. NM_(—)004168 (SEQ ID NO:1712); amplicon—SDHA-amplicon, SEQID NO:331) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of thequantities of the housekeeping genes. The normalized quantity of each RTsample was then divided by the median of the quantities of the normalpost-mortem (PM) samples (Sample Nos. 47-50, 90-93, 96-99, Table 2,“Tissue samples in testing sample”,), to obtain a value of foldup-regulation for each sample relative to median of the normal PMsamples.

FIG. 19 is a histogram showing over expression of the above-indicatedGRP_HUMAN—gastrin-releasing peptide transcripts in several cancerouslung samples relative to the normal samples. As is evident from FIG. 19,the expression of GRP_HUMAN—gastrin-releasing peptide transcriptsdetectable by the above amplicon in several cancer samples wassignificantly higher than in the non-cancerous samples (Sample Nos.47-50, 90-93, 96-99, Table 2, “Tissue samples in testing sample”).Notably an over-expression of at least 10 fold was found in 2 out of 15adenocarcinoma samples, and in 7 out of 8 small cells carcinoma samples.

Primer pairs are also optionally and preferably encompassed within thepresent invention; for example, for the above experiment, the followingprimer pair was used as a non-limiting illustrative example only of asuitable primer pair: HUMGRP5Ejunc3-7F forward primer (SEQ ID NO: 1646);and HUMGRP5Ejunc3-7R reverse primer (SEQ ID NO: 1647).

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: HUMGRP5Ejunc3-7 (SEQID NO: 1648).

HUMGRP5Ejunc3-7F (SEQ ID NO: 1646): ACCAGCCACCTCAACCCA HUMGRP5Ejunc3-7R(SEQ ID NO: 1647): CTGGAGGAGAGAGTCTTTGCCT HUMGRP5Ejunc3-7 (SEQ ID NO:1648): ACCAGCCACCTCAACCCAAGGCCCTGGGCAATCAGCAGCCTTCGTGGGATTCAGAGGATAGCAGCAACTTCAAAGATGTAGGTTCAAAAGGCAAAGACTC TCTGCTCCAGExpression of GRP_HUMAN—gastrin-releasing peptide (HUMGRP5E) transcriptswhich are detectable by amplicon as depicted in sequence nameHUMGRP5Ejunc3-7 (SEQ ID NO: 1648) in different normal tissues

Expression of GRP_HUMAN—gastrin-releasing peptide transcripts detectableby or according to HUMGRP5E junc3-7 amplicon (SEQ ID NO: 1648) andHUMGRP5E junc3-7F (SEQ ID NO: 1646) and HUMGRP5E junc3-7R (SEQ ID NO:1647) was measured by real time PCR. In parallel the expression of fourhousekeeping genes—RPL19 (GenBank Accession No. NM_(—)000981 (SEQ IDNO:1715); RPL19 amplicon, SEQ ID NO:1630), TATA box (GenBank AccessionNo. NM_(—)003194 (SEQ ID NO:1716); TATA amplicon, SEQ ID NO:1633),Ubiquitin (GenBank Accession No. BC000449 (SEQ ID NO:1711);amplicon—Ubiquitin-amplicon, SEQ ID NO:328) and SDHA (GenBank AccessionNo. NM_(—)004168 (SEQ ID NO:1712); amplicon—SDHA-amplicon, SEQ IDNO:331) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of thequantities of the housekeeping genes. The normalized quantity of each RTsample was then divided by the median of the quantities of the breastsamples (Sample Nos. 33-35, Table 3, “Tissue samples on normal panel”,above), to obtain a value of relative expression of each sample relativeto median of the breast samples.

HUMGRP5Ejunc3-7F (SEQ ID NO: 1646): ACCAGCCACCTCAACCCA HUMGRP5Ejunc3-7R(SEQ ID NO: 1647): CTGGAGCAGAGAGTCTTTGCCT HUMGRP5Ejunc3-7 (SEQ ID NO:1648): ACCAGCCACCTCAACCCAAGGCCCTGGGCAATCAGCAGCCTTCGTGGGATTCAGAGGATAGCAGCAACTTCAAAGATGTAGGTTCAAAAGGCAAAGACTC TCTGCTCCAG

The results are shown in FIG. 20, demonstrating the expression ofGRP_HUMAN—gastrin-releasing peptide (HUMGRP5E) transcripts which aredetectable by amplicon as depicted in sequence name HUMGRP5Ejunc3-7 indifferent normal tissues.

Differential Expression of HUMGRP5E_P5 (SEQ ID NO:1300) in Small CellLung Carcinoma Patients as Compared to Healthy Subjects.

HUMGRP5E_P5 (SEQ ID NO: 1300) variant of the present invention resultsfrom alternative splicing of the GRP gene and it contains 142 aminoacids. The first 121 amino acids from the N-terminal are shared with WTGRP 1 (SEQ ID NO:1421), WT GRP 2 (SEQ ID NO:1788) and WT GRP 3 (SEQ IDNO:1789). The next 6 amino acids are absent in WT GRP 2 (SEQ IDNO:1788), but appear in the other two known isoforms. In WT GRP 2 (SEQID NO:1788), shared 121 N-terminal region is followed by two amino acidswhich are absent in HUMGRP5E_P5 (SEQ ID NO: 1300) and in the two otherknown isoforms (SEQ ID NOs: 1421 and 1789). In addition, WT GRP 2 (SEQID NO:1788) shares with HUMGRP5E_P5 (SEQ ID NO: 1300) splice variant ofthe present invention a tail of 15 C-terminal amino acids. Thus,HUMGRP5E_P5 (SEQ ID NO: 1300) splice variant of the present inventionhas a novel bridge connecting the 127 amino acids head of WT GRPisoforms 1 (SEQ ID NO:1421) and 3 (SEQ ID NO:1789) and the 15 aminoacids tail of WT GRP isoform 2 (SEQ ID NO:1788).

The alignment comparison of the known GRP isoforms and the HUMGRP5E_P5(SEQ ID NO: 1300) is presented in FIG. 98. In FIG. 98, the firstN-terminal 127 amino acids common for HUMGRP5E_P5 (SEQ ID NO: 1300)splice variant, WT GRP 1 (SEQ ID NO:1421) and WT GRP 3 (SEQ ID NO:1789)are shown in bold: 2 amino acids that appear in the WT GRP 2 (SEQ IDNO:1788) and absent in the HUMGRP5E_P5 (SEQ ID NO: 1300) are doubleunderlined: 15 amino acids common for HUMGRP5E_P5 (SEQ ID NO: 1300) andWT GRP isoform 2 (SEQ ID NO:1788) are underlined.

1. Protein Production of HUMGRP5E P5 (SEQ ID NO: 1300), WT GRP 1 (SEQ IDNO:1421) and HUMGRP5E P5 Representing Fragment (SEQ ID NO: 1794)

As a tool for antibody development and ELISA assay development, thefollowing recombinant proteins were produced: HUMGRP5E_P5 splice variant(GRP_(—)142) (SEQ ID NO: 1793) and WT GRP 1 (GRP_(—)148) (SEQ IDNO:1792) proteins. In addition, a peptide representing the 15 aminoacids (SEQ ID NO: 1794) from the C-terminals tail of WT GRP 2 (SEQ IDNO:1788) and HUMGRP5E_P5 (SEQ ID NO: 1300) was synthesized, to be usedas a negative selection tool.

1.1 Cloning and Expression in Mammalian Cells 1.1.1. Cloning of HUMGRP5EP5 (SEQ ID NO: 1793) (GRP-142) and WT GRP 1 (GRP-148) (SEQ ID NO:1792)

GRP sequences starting from the predicted protein cleavage site(corresponding to amino acid at position 54) were codon optimized toboost protein expression in mammalian cells. In addition, bacteriallow-usage codons were eliminated to enable bacterial expression of thevariants using the same DNA fragment.

The optimized genes were chemically synthesized at GeneArt (Germany)using their proprietary gene synthesis technology, with the addition ofDNA sequences encoding a 8× His tag downstream to the ectopic IL6 signalpeptide. The resulting DNA sequences for GRP-148 (SEQ ID NO: 1790) andGRP142 (SEQ ID NO: 1791) are shown in FIGS. 99 a and 99 b, respectively.The IL6 signal peptide was added to enable secretion from mammaliancells, while the His-tag was added to facilitate protein purification.Flanking EcoRI and NotI sites were introduced at the 5′ and 3′ ends ofthe DNA fragments respectively (underlined in FIGS. 99 a and 99 b).Protein sequences for GRP-148 (SEQ ID NO: 1792) and GRP-142 (SEQ ID NO:1793) are shown in FIGS. 100 a and 100 b, respectively.

The DNA fragments were cloned into EcoRI and NotI sites of pIRESpuro3(Clontech, cat #PT3646-5) and the DNA sequence was verified. Plasmidmaps are shown in FIGS. 101 a and 101 b.

The expected MWs of the 2 mammalian proteins are:

-   GRP142 (SEQ ID NO: 1793) 11.4 kDa-   GRP148 (SEQ ID NO: 1792) 12.0 kDa

1.1.2. Mammalian Expression of GRP Proteins

GRP constructs were transfected into HEK-293T cells (ATCC catalog numberCRL-11268) as follows: One day prior to transfection, one well in a 6well plate was plated with 500,000 cells in 2 ml DMEM (Dulbecco'smodified Eagle's medium; Biological Industries, Cat#: 01-055-1A)containing 10% FBS and incubated at 37° C. in a 5% CO2 humidifiedincubator. Transfection was done by FuGENE 6 Transfection Reagent(Roche, Cat#: 1-814-443) according to manufacturer's protocol. Following48h, transfected cells were split and subjected to antibiotic selectionusing 5 microgram/ml puromycin. The surviving cells were propagated for2-3 weeks.

Expression of the recombinant proteins in supernatant of transfectedcells was verified by Western Blot (WB) analysis using anti Hisantibodies (Serotec, Cat. #MCA1396) as shown in FIG. 102, lanes 6 and 7.

1.2. Production and Purification of GRP-148 (SEQ ID NO: 1792)

1.2.1. Mammalian Production of the GRP-148 (SEQ ID NO: 1792)

In order to produce sufficient amounts of the protein, HEK293T cellsexpressing GRP-148 (SEQ ID NO: 1792) were further propagated inserum-free medium as described below. Cells were taken from a T-80 flaskcontaining serum supplemented medium after trypsinization, and weretransferred into shake flasks containing serum free medium (EX-CELL293,JRH)) supplemented with 4 mM glutamine and selection antibiotics (5ug/ml puromycin). Cultures were incubated at 37° C. on a shaker, at 100RPM, and were diluted into a consecutive shake flask containing freshmedium when cell density reached 2-3.1×10⁶ cell/ml. After severalpassages in serum-free medium the adapted cells served as an inoculumfor production.

Production of GRP 148 (SEQ ID NO: 1792) was carried out in stirred-tankbioreactor equipped with an acoustic cell retention device (ADIAutoclavable Glass Bioreactor, Applikon and BioSep10, Applisens),operated in perfusion mode. The bioreactor was seeded with cell densityof 1.1×10⁶ cells/ml, in a final working volume of 3.5 L. After growthphase of 4 days, a production phase of 17 days in serum-free mediumsupplemented with 4 mM glutamine (without selection antibiotics) wascarried out during which the growth temperature was reduced from 37° C.to 34° C. During production phase, cell density reached 2.6×10⁷ cells/mland the culture was fed at perfusion rate of 1-3 replacements per day.The total of 112 L of harvest collected was filtered through a 0.22 umfilter and used for protein purification. GRP 148 (SEQ ID NO: 1792)harvest was concentrated approximately 10 fold and the buffer wasexchanged to diafiltration buffer(50 mM NaH₂PO4, 0.3 M NaCl, pH 8.0)using PALL ultrafiltration system. Imidazole solution (2M pH 8.0) wasadded to a final concentration of 10 mM, the harvest was filteredthrough 0.22 um filter).

1.2. 2 Purification of the GRP-148 (SEQ ID NO: 1792).

Purification process was carried out using a gravity-flowcolumn(Econo-pac 20 ml, BioRad) for binding, and AKTA Explorer (GEHealthcare) for washing and elution. 1 ml Ni-NTA was washed andequilibrated in a gravity-flow column. The resin was transferred into a250 ml vessel, the treated harvest (total volume 0.2L) was added andincubated over night rolling at 4° C. to allow binding of the protein.On the following day the resin was packed in a 5/50 Tricorn column (GEHealthcare). The column was connected to the AKTA system and washed with15 CV (column volumes) of buffer A (50 mM NaH₂PO4, 0.3 M NaCl, 10 mMImidazole, pH 8.0) at flow rate of 1 ml/min. Elution was carried outwith 10 CV of buffer of buffer B (50 mM NaH₂PO4, 0.3 M NaCl, 250 mMImidazole, pH 8.0 at a flow-rate of 0.4 ml/min. The eluted fractionswere pooled and dialyzed against dialysis buffer (Dulbecco's Phosphatebuffers saline pH 7.4 (w/o Ca, w/o Mg)) over night at 4° C. with 3buffer exchanges of 5L each. The dialyzed protein was filtered through0.45 um filter, aliquoted and stored at −70° C.

Samples of the purified proteins were analyzed by SDS-PAGE stained byCoomassie, as shown in FIG. 103. The identity of the purified proteinwas verified by mass spectrometry analysis.

1.3. Mammalian Production and Purification of the GRP-142 (SEQ ID NO:1793)

4.3.1 Mammalian Production of the GRP-142 (SEQ ID NO: 1793)

In order to produce sufficient amounts of the protein, the HEK293T cellsexpressing GRP-142 (SEQ ID NO: 1793) cells were further propagated inserum-free medium as described below. HEK293T cells expressing GRP-142(SEQ ID NO: 1793) were taken from a T-80 flask containing serumsupplemented medium after trypsinization, and were transferred intoshake flasks containing serum free medium (EX-CELL293, JRH) supplementedwith 4 mM glutamine and selection antibiotics (5 ug/ml puromycin).Cultures were incubated at 37° C. on a shaker, 100 RPM, and when celldensity reached 2-4×10⁶ cell/ml, the cells were diluted into aconsecutive shake flask containing fresh medium. After several passagesin serum-free medium the adapted cells served as an inoculum forproduction.

Production-phase growth of GRP-142 (SEQ ID NO: 1793) was carried out ina hollow-fiber bioreactor (Accusyst-Maximizer, Biovest) operated inperfusion mode. The hollow-fiber cultureware (1.5 m², cellulose acetatedouble column) was inoculated with 8.5×10⁹ viable cells. The culture wasfed with basal medium (IMDM supplemented with additional 2 mM glutamine)on the non-cell side of the fibers (intra-capillary), and with acomplete serum-free-medium (EX-CELL293, JRH) on the cell side of thefibers (extra-capillary).

The production was carried out for 75 days, during which a total of 325L of harvest were collected. Due to the low size of the GRP-142, itpassed through the fiber. Hence, harvest was collected from both theintra-capillary (11 L) and the extra-capillary (314 L) fluids. Allharvest batches were filtered through a 0.22 um filter and used forprotein purification.

1.3.2 Purification of the GRP-142 (SEQ ID NO: 1793)

The purification process was carried out using a gravity column forbinding, and AKTA Explorer (GE Healthcare) for washing and elution. 1mlNi-NTA was washed and equilibrated in a 20m1 gravity column. The resinwas transferred into a 500 ml vessel, the treated harvest (total volumeof 0.54 L) was added and incubated over night on a roller at 4° C. toallow binding of the protein. On the following day the resin was packedin a 5/50 Tricorn column. The column was connected to the AKTA systemand washed with buffer A at flow rate of 1 ml/min 15 CV. Elution wascarried out by applying buffer 10 CV of buffer at a flow-rate of 0.4ml/min. The eluted fractions were pooled and dialyzed against dialysisbuffer over night at 4° C. with 3 volume exchange of 5 L each. Thedialyzed protein was filtered through 0.45 um filter, aliquoted andstored at −70° C.

The identity of the purified proteins was verified by mass spectrometryanalysis. Samples of the purified proteins obtained in all abovepurification bathes were analyzed by SDS-PAGE stained by Coomassie, asshown in FIG. 104.

1.4. HUMGRP5E P5 C-Terminal Peptide Tail (SEQ ID NO: 1794)

A peptide of 16 amino acids (SEQ ID NO: 1794) comprising of 15 aminoacids common to HUMGRP5E_P5 (SEQ ID NO: 1300) splice variant and WT GRP2 (SEQ ID NO: 1788) isoform and C-terminal Cys (added to facilitatecopling to BSA and agarose beads) was synthesized by Sigma-AldrichIsrael-LTD with a purity of ≧95%.

HUMGRP5E_P5 C-termianl peptide tail sequence (DSPS16) (SEQ ID NO: 1794):H-Asp-Ser-Leu-Leu-Gln-Val-Leu-Asn-Val-Lys-Glu-Gly- Thr-Pro-Ser-Cys-OH

2. Antibody Development

In order to test HUMGRP5E_P5 (SEQ ID NO: 1300) protein expressionpattern in serum samples of diseased and healthy individuals, specificpolyclonal antibodies were developed as described below.

The antibody of interest had to recognize specifically HUMGRP5E_P5 (SEQID NO: 1300), without recognizing WT GRP 1 (SEQ ID NO:1421) and withoutrecognizing the HUMGRP5E_P5 C-terminal peptide tail (SEQ ID NO:1794),common to the HUMGRP5E_P5 variant of the present invention (SEQ IDNO:1300) and to the known GRP-2 isoform (SEQ ID NO:1788). Therefore,serum titers as well as resultant antibodies were tested against allthree protein/peptide preparations following a successful recognition ofthe HUMGRP5E_P5 (SEQ ID NO: 1300)-specific immunogen.

Peptide Design and Synthesis

One peptide was selected as HUMGRP5E_P5 (SEQ ID NO: 1300)-specificimmunogen for polyclonal antibody development. The peptide sequence inthe area of the unique bridge was used as a template.

Selected HUMGRP5E_P5 (SEQ ID NO: 1300)-specific immunogen: The primarysequence of the immunogen peptide (CGEN0601 (SEQ ID NO: 1795)) is shownbelow. Terminal cysteine residue was used to facilitate coupling viam-maleimidobenzoyl-N-hydroxysuccinimide ester (MBS) to Keyhole LimpetHemocyanin (KLH).

Peptide CGEN0601 (SEQ ID NO: 1795): Ac-SKGKDSLLQVL-Ahx-C-amide

This peptide represents an internal region of the protein sequence andit was therefore blocked at the amino terminal end by acetylation and atits carboxy end by amidation. The illustration in FIG. 105 shows thesequence of the selected immunogen marked on the primary sequence of theHUMGRP5E_P5 (SEQ ID NO: 1300) protein.

The immunogen peptide was synthesized using a conventional technology(50 mg; purity ≧90%). The peptide was conjugated to Keyhole LimpetHemocyanin (KLH) and Bovine Serum Albumin (BSA) using anm-maleimidobenzoyl-N-hydroxysuccinimide ester (MBS) linker.

2.2 Rabbit Polyclonal Antibody Development

2.2.1. Rabbit Immunization and Sera Testing

Three New Zealand White Rabbits (8346, 8348 and 8349) were immunizedwith CGEN0601 (SEQ ID NO: 1795) conjugated with KLH. Immunizationschedule and production bleed schedules are summarized in Tables 136 and137, respectively.

TABLE 136 Summary of rabbit immunization and test bleed schedule.Scheduled Date Initial Injection Boost #1 Boost #2 Boost #3 (500 μg (250μg (250 μg (250 μg Test Bleed Rabbit # Pre Bleed ID/CFA) ID/IFA) SC/IFA)SC/IFA) #1 8346 Jun. 12, 2006 Jun. 16, 2006 Jun. 23, 2006 Jun. 30, 2006Jul. 14, 2006 Jul. 24, 2006 8348 Jun. 12, 2006 Jun. 16, 2006 Jun. 23,2006 Jun. 30, 2006 Jul. 14, 2006 Jul. 24, 2006 8349 Jun. 12, 2006 Jun.16, 2006 Jun. 23, 2006 Jun. 30, 2006 Jul. 14, 2006 Jul. 24, 2006

TABLE 137 Summary of rabbit production bleed schedule. Scheduled DateProduction Production Production Production Production ProductionProduction Rabbit # Bleed #1 Bleed #2 Bleed #3 Bleed #4 Bleed #5 Bleed#6 Bleed #7 8346 Aug. 3, 2006 Aug. 14, 2006 Aug. 21, 2006 Sep. 25, 20068348 Aug. 3, 2006 Aug. 14, 2006 Aug. 21, 2006 Sep. 11, 2006 Sep. 18,2006 Sep. 25, 2006 Oct. 9, 2006 8349 Aug. 3, 2006 Aug. 14, 2006 Aug. 21,2006 Sep. 11, 2006 Sep. 18, 2006 Sep. 25, 2006 Oct. 9, 2006 ProductionProduction Production Production Production Production Production Rabbit# Bleed #8 Bleed #9 Bleed #10 Bleed #11 Bleed #12 Bleed #13 Bleed #148346 8348 Oct. 16, 2006 Nov. 13, 2006 Nov. 20, 2006 Nov. 27, 2006 Dec.11, 2006 Dec. 18, 2006 Dec. 25, 2006 8349 Oct. 16, 2006 Nov. 13, 2006Nov. 20, 2006 Nov. 27, 2006 Dec. 11, 2006 Dec. 18, 2006 Dec. 25, 2006Production Production Production Terminal Rabbit # Bleed #15 Bleed #16Bleed #17 Bleed 8346 Oct. 30, 2007 8348 Jan. 2, 2007 8349 Jan. 2, 2007Jan. 8, 2007 Jan. 15, 2007 XXXXX

Production bleeds were collected and antibody titers were determined byELISA using CGEN0601 (SEQ ID NO: 1795) peptide conjugated with BSA,recombinant HUMGRP5E_P5 (SEQ ID NO: 1793) splice variant, WT GRP 1protein (SEQ ID NO:1792) and HUMGRP5E_P5 C-terminal peptide tail (SEQ IDNO: 1794).

Rabbit 8346 showed a lower antibody titer against the splice variant(SEQ ID N):1793) (SVr) protein as compared to rabbits 8348 and 8349,therefore only few production bleeds were collected from this rabbit andits bleeds were not purified.

2.2.2 Rabbit Polyclonal Antibody Affinity Purification

Affinity purification was performed on all production bleeds collectedfrom the two rabbits (8348 and 8349) using a CGEN0601 (SEQ ID NO: 1795)immunoaffinity resin. Two passes of PBS diluted antiserum (1:1) were runon immunoaffinity resin prepared by coupling 10 mg Peptide CGEN0601 (SEQID NO: 1795) (Lot 06-2996-2137) [Sequence: Ac-SKGKDSLLQVL-Ahx-C-amide]to agarose beads. The purified product was concentrated to approximately1 mg/ml and dialyzed against 1×PBS. The yield obtained from thesepurifications is summarized in Table 138 below.

TABLE 138 Total Lot Number Rabbit Concentration Volume Yield Buffer18878C 8349 1.10 mg/ml 37.0 ml 40.7 mg 0.02 M Potassium Phosphate, 0.15M Sodium Chloride, pH 7.2 18980C 8348  1.0 mg/ml 82.0 ml 82.0 mg 0.02 MPotassium Phosphate, 0.15 M Sodium Chloride, pH 7.2Purified antibodies were assayed by ELISA for reactivity towards theimmunogen (SEQ ID NO: 1795) conjugated to BSA, splice variant protein(SEQ ID NO: 1793), wild type protein (SEQ ID NO:1792), and HUMGRP5E_P5C-terminal peptide tail (SEQ ID NO: 1794) conjugated to BSA. Results aresummarized in FIGS. 106 and 107.

Reactivity of the purified antibodies to both the splice variant and thewild type proteins was also tested by a Western blot analysis of bothpurified antibody preparations. The results suggested a good recognitionof the HUMGRP5E_P5 (SEQ ID NO: 1793) splice variant and no recognitionof the WT GRP 1 (SEQ ID NO:1792) protein. The data is shown in FIGS. 108and 109.

The two antibody preparations described above showed a good binding toHUMGRP5E_P5 (SEQ ID NO:1793) splice variant and low recognition of theWT GRP 1 protein (SEQ ID NO:1792). The binding of the purifiedantibodies to HUMGRP5E_P5 C-terminal peptide tail (SEQ ID NO:1794) washigh in both preparations.

Rabbit 8349 (Lot #18878C) had higher titers against HUMGRP5E_P5 (SEQ IDNO: 1793) splice variant and lower titers against HUMGRP5E_P5 C-terminalpeptide tail (SEQ ID NO:1794) as compared to Rb8348 (Lot #18980).Therefore, this lot was selected for cross absorption against theHUMGRP5E_P5 C-terminal peptide tail (SEQ ID NO:1794) in order tosignificantly decrease its recognition to known GRP isoforms.

The affinity purified antibody from rabbit 8349 was run over animmunoaffinity resin prepared by coupling 10.0 mg of GRP-negativecontrol Peptide (DSPS16) (SEQ ID NO: 1794) to agarose beads. The flowthrough was collected as the affinity purified cross adsorbed product.The eluant was collected as the pan reactive antibody. All purifiedproducts were concentrated to 1.0 mg/ml and dialyzed against 1×PBS.Prior to final vialing, each antibody was filter sterilized (0.22 μm).The cross absorbed product prepared from lot 18878C was named lot18978C. The eluant—pan reactive antibody was named 18979C. Antibodyyield from cross adsorption is presented in Table 139 below.

TABLE 139 Yield from cross adsorption of Rabbit 8349 (Lot 18878C). LotNumber Rabbit Concentration Volume Total Yield Buffer Rb 8439 Cross 8349 1.5 mg/ml 18.0 ml 27.0 mg 0.02 M Potassium Phosphate, 0.15 M absorbedSodium Chloride, pH 7.2 product Lot 18978C Rb 8349 Pan 8349 1.37 mg/ml 2.3 ml  3.1 mg 0.02 M Potassium Phosphate, 0.15 M Reactive SodiumChloride, pH 7.2 Lot 18979C

Antibodies prepared by cross absorption (Rb 8349 cross absorbed product,lot#18978C) were assayed for reactivity towards the immunogen (SEQ IDNO:1795) conjugated to BSA, splice variant protein (SEQ ID NO:1793),wild type protein (SEQ ID NO:1792), and HUMGRP5E_P5 C-terminal peptidetail (SEQ ID NO:1794) conjugated to BSA. Results presented in FIG. 110.

The cross absorbed antibodies possessed a good recognition ofHUMGRP5E_P5 (SEQ ID NO:1793) splice variant and a low recognition ofboth, WT GRP 1 (SEQ ID NO:1792) and the HUMGRP5E_P5 C-terminal peptidetail (SEQ ID NO:1794). Therefore, this preparation was later used forassay development.

3. HUMGRP5E P5 (SEQ ID NO: 1300) Assay Development

Assay Development stage of HUMGRP5E_P5 (SEQ ID NO: 1300) was carried outthrough CommonWealth Biotechnologies (CBI), Inc., a US-based serviceprovider, using serum samples of Lung Cancer patients.

For assay development purposes polyclonal antibody preparation (Rocklandpolyclonal, Rabbit 8349 that was cross absorbed on the GRP-negativecontrol peptide) was used. As indicated above, this antibody wasdeveloped against a synthetic peptide Acetyl-SKGKDSLLQVL-amide (SEQ IDNO: 1795), comprising the unique bridge specific for HUMGRP5E_P5 (SEQ IDNO: 1300) splice variant.

Three ELISA formats were developed in order to identify the mostsensitive assay format for the detection of HUMGRP5E_P5 (SEQ ID NO:1300) protein in serum:

a Sandwich ELISA

a Antibody capture competitive ELISA

Antigen capture competitive ELISA

3.1 Sandwich ELISA

In order to develop a sandwich ELISA test, the cross absorbed polyclonalantibody (Rb8349 cross absorbed product) has been tested both as acapture and a detector antibody. For serving as a detector, antibody waslabeled with biotin. The sandwich assay format was not able to detectHUMGRP5E_P5 (SEQ ID NO: 1793) spiked in serum in all the concentrationsthat were tested (≦1 ug/ml).

3.2 Antibody Capture Competitive ELISA

ELISA plates were coated with the antibody and its binding tobiotin-labeled HUMGRP5E_P5 (SEQ ID NO: 1793) spiked in serum samples wasassessed. Non-labeled HUMGRP5E_P5 (SEQ ID NO: 1793) was tested as acompeting antigen. The antibody capture assay format was the following(Format 1):

Coat: Rabbit 8349, cross absorbed product

Detector: HUMGRP5E_P5 (SEQ ID NO: 1793) biotin-labeled protein

LOD for HUMGRP5E_P5 (SEQ ID NO: 1793): ˜14 ng/ml

3.3 Antigen Capture Competitive ELISA

ELISA plates were coated with HUMGRP5E_P5 (SEQ ID NO: 1793) splicevariant protein and its binding to antibody pre-incubated withpeptide-spiked serum samples was assessed. The antigen capture assay wasthe following (Format 2):

Coat: HUMGRP5E_P5 (SEQ ID NO: 1793) protein

Detector: Rabbit 8349, Cross absorbed product

LOD for HUMGRP5E_P5 (SEQ ID NO: 1793): ˜10 ng/ml

The results observed with the various assay formats showed a comparableperformance of both antigen and antibody capture competitive tests, witha slightly lower LOD for the format 2. This format 2 did not recognizespiked WT GRP 1 (SEQ ID NO:1792) and HUMGRP5E_P5 C-terminal peptide tail(SEQ ID NO: 1794) samples (up to concentration of 0.88 nmol/ml which isequivalent to 10 ug/ml of the HUMGRP5E_P5 (SEQ ID NO: 1793)).

It was therefore decided to continue with the antigen capturecompetitive assay format for serum samples testing.

4. Serum Screening

Serum of Small Cell Lung Cancer (SCLC) patient's sera and control sera(ProMedDx) were tested by HUMGRP5E_P5 antigen competitive assaydescribed above.

4.1 Serum Samples Screening

Sera from eight Small Cell Lung Cancer (SCLC) patients and 21gender-matched control sera (Mean age: 65y×7; 50 y±2, respectively) wereassayed using optimized HUMGRP5E_P5 (SEQ ID NO: 1300) antigen capturecompetitive assay (Format 2, above). The results are presented in Table140 as well and in FIG. 111.

TABLE 140 Concentration of CgenGRP in control and SCLC patients' sera.Serum screening 7.1 CgenGRP Cgen GRP Lung cancer concentration, Normalcontrols concentration, Sample ID ng/ml Sample ID ng/ml 11069742 *11069756 61 11069783 * 11069722 43 11069743 48 11069803  4 11069725 1011069754 * 11069769 ** 11069784 * 11069785 * 11069794 * 11069758 * P13228 11069739 * P220 95 11069736 * P490 65 11069745 18 P8 41 11069765 18P805 55 11069767 15 P873 55 11069790 13 P90 32 11069780 * P93 5511069771 * Mean 53 Mean 11 St. Dev. 21 St. Dev. 18 * Below LOD **Detected, but below LOQL

The results revealed that HUMGRP5E_P5 (SEQ ID NO: 1300) concentrationsdetected in SCLC sera are relatively higher than HUMGRP5E_P5 (SEQ ID NO:1300) concentrations detected in the control sera. The meanconcentration level of HUMGRP5E_P5 (SEQ ID NO: 1300) levels was53.2±21.4 ng/ml for patients and 11.0±18.1 ng/ml for controls. Threecontrol samples (out of the 21 tested) showed positive signals in therange observed for patients and 6 control samples showed signals lowerthan the range observed for patients. The remaining 12 controls had nosignal. The results indicate that HUMGRP5E_P5 (SEQ ID NO: 1300) canserve as a serum marker for the detection of SCLC patients.

The antibodies specific for HUMGRP5E_P5 (SEQ ID NO: 1300) splice variantwere able to detect HUMGRP5E_P5 (SEQ ID NO: 1300) variant protein inserum samples, including in Small Cell lung cancer patients serum,however, sensitivity and reproducibility of the results were hampered byapparent low levels of the protein in serum and also by technicalproblems with the assays, according to additional results that are notshown.

Description for Cluster D56406

Cluster D56406 features 3 transcript(s) and 10 segment(s) of interest,the names for which are given in Tables 141 and 142, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 143.

TABLE 141 Transcripts of interest Transcript Name Sequence ID No.D56406_PEA_1_T3 22 D56406_PEA_1_T6 23 D56406_PEA_1_T7 24

TABLE 142 Segments of interest Segment Name Sequence ID No.D56406_PEA_1_node_0 340 D56406_PEA_1_node_13 341 D56406_PEA_1_node_11342 D56406_PEA_1_node_2 343 D56406_PEA_1_node_3 344 D56406_PEA_1_node_5345 D56406_PEA_1_node_6 346 D56406_PEA_1_node_7 347 D56406_PEA_1_node_8348 D56406_PEA_1_node_9 349

TABLE 143 Proteins of interest Protein Name Sequence ID No.D56406_PEA_1_P2 1301 D56406_PEA_1_P5 1302 D56406_PEA_1_P6 1303

These sequences are variants of the known protein Neurotensin/neuromedinN precursor [Contains: Large neuromedin N (NmN-125); Neuromedin N (NmN)(NN); Neurotensin (NT); Tail peptide] (SwissProt accession identifierNEUTHUMAN), SEQ ID NO: 1422, referred to herein as the previously knownprotein.

Protein Neurotensin/neuromedin N precursor is known or believed to havethe following function(s): Neurotensin may play an endocrine orparacrine role in the regulation of fat metabolism. It causescontraction of smooth muscle. The sequence for proteinNeurotensin/neuromedin N precursor is given at the end of theapplication, as “Neurotensin/neuromedin N precursor [Contains: Largeneuromedin N (NmN-125); Neuromedin N (NmN) (NN); Neurotensin (NT); Tailpeptide] amino acid sequence”. Protein Neurotensin/neuromedin Nprecursor localization is believed to be Secreted; Packaged withinsecretory vesicles.

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: signal transduction, which areannotation(s) related to Biological Process; neuropeptide hormone, whichare annotation(s) related to Molecular Function; and extracellular;soluble fraction, which are annotation(s) related to Cellular Component.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbinlmdot nih dotgov/projects/LocusLink/>.

As noted above, cluster D56406 features 3 transcript(s), which werelisted in Table 141 above. These transcript(s) encode for protein(s)which are variant(s) of protein Neurotensin/neuromedin N precursor. Adescription of each variant protein according to the present inventionis now provided.

Variant protein D56406_PEA_(—)1_P2 (SEQ ID NO:1301) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) D56406_PEA_(—)1_T3 (SEQ IDNO:22). An alignment is given to the known protein(Neurotensin/neuromedin N precursor) at the end of the application. Oneor more alignments to one or more previously published protein sequencesare given at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between D56406_PEA_(—)1_P2 (SEQ ID NO:1301) andNEUT_HUMAN (SEQ ID NO:1422):

1. An isolated chimeric polypeptide encoding for D56406_PEA_(—)1_P2 (SEQID NO:1301), comprising a first amino acid sequence being at least 90%homologous toMMAGMKIQLVCMLLLAFSSWSLCSDSEEEMKALEADFLTNMHTSKISKAHVPSWKMTLLNVCSLVNNLNSPAEETGEVHEEELVARRKLPTALDGFSLEAMLTIYQLHKICHSRAFQHWE corresponding toamino acids 1-120 of NEUT_HUMAN (SEQ ID NO:1422), which also correspondsto amino acids 1-120 of D56406_PEA_(—)1_P2 (SEQ ID NO:1301), secondamino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceARWLTPVIPALWEAETGGSRGQEMETIPANT (SEQ ID NO: 1773) corresponding to aminoacids 121-151 of D56406_PEA_(—)1_P2 (SEQ ID NO:1301), and a third aminoacid sequence being at least 90% homologous toLIQEDILDTGNDKNGKEEVIKRKIPYILKRQLYENKPRRPYILKRDSYYY corresponding toamino acids 121-170 of NEUT_HUMAN (SEQ ID NO:1422), which alsocorresponds to amino acids 152-201 of D56406_PEA_(—)1_P2 (SEQ IDNO:1301), wherein said first, second and third amino acid sequences arecontiguous and in a sequential order.

2. An isolated polypeptide encoding for an edge portion ofD56406_PEA_(—)1_P2 (SEQ ID NO:1301), comprising an amino acid sequencebeing at least 70%, optionally at least about 80%, preferably at leastabout 85%, more preferably at least about 90% and most preferably atleast about 95% homologous to the sequence encoding forARWLTPVIPALWEAETGGSRGQEMETIPANT (SEQ ID NO: 1773), corresponding toD56406_PEA_(—)1_P2 (SEQ ID NO:1301).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein D56406_PEA_(—)1_P2 (SEQ ID NO:1301) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 144, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein D56406_PEA_(—)1_P2 (SEQ ID NO:1301) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 144 Amino acid mutations SNP position(s) on amino Alternativeamino acid sequence acid(s) Previously known SNP? 30 M -> V No 44 S -> PNo 84 V -> No 84 V -> A No

Variant protein D56406_PEA_(—)1_P2 (SEQ ID NO:1301) is encoded by thefollowing transcript(s): D56406_PEA_(—)1_T3 (SEQ ID NO:22), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript D56406_PEA_(—)1_T3 (SEQ ID NO:22) is shown inbold; this coding portion position 106 and ends at position 708. Thetranscript also has the following SNPs as listed in Table 145 (givenaccording to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinD56406_PEA_(—)1_P2 (SEQ ID NO:1301) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 145 Nucleic acid SNPs SNP position on nucleotide Previouslysequence Alternative nucleic acid known SNP? 94 G -> T No 95 A -> T No858 T -> G Yes 103 A -> G Yes 193 A -> G No 235 T -> C No 339 T -> C No356 T -> No 356 T -> C No 417 A -> T No 757 T -> No

Variant protein D56406_PEA_(—)1_P5 (SEQ ID NO:1302) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) D56406PEA_(—)1_T6 (SEQ IDNO:23). An alignment is given to the known protein(Neurotensin/neuromedin N precursor) at the end of the application. Oneor more alignments to one or more previously published protein sequencesare given at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between D56406_PEA_(—)1_P5 (SEQ ID NO:1302) andNEUT_HUMAN (SEQ ID NO:1422):

1. An isolated chimeric polypeptide encoding for D56406_PEA_(—)1_P5 (SEQID NO:1302), comprising a first amino acid sequence being at least 90%homologous to MMAGMKIQLVCMLLLAFSSWSLC corresponding to amino acids 1-23of NEUT_HUMAN (SEQ ID NO:1422), which also corresponds to amino acids1-23 of D56406_PEA_(—)1_P5 (SEQ ID NO:1302), and a second amino acidsequence being at least 90% homologous toSEEEMKALEADFLTNMHTSKISKAHVPSWKMTLLNVCSLVNNLNSPAEETGEVHEEELVARRKLPTALDGFSLEAMLTIYQLHKICHSRAFQHWELIQEDILDTGNDKNGKEEVIKRKIPYILKRQLYENKPRRPYILKRDSYYY corresponding to amino acids 26-170 of NEUT_HUMAN (SEQ ID NO:1422),which also corresponds to amino acids 24-168 of D56406_PEA_(—)1_P5 (SEQID NO:1302), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

2. An isolated chimeric polypeptide encoding for an edge portion ofD56406_PEA_(—)1_P5 (SEQ ID NO:1302), comprising a polypeptide having alength “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise CS, having a structureas follows: a sequence starting from any of amino acid numbers 23−x to23; and ending at any of amino acid numbers 24+((n−2)−x), in which xvaries from 0 to n−2.

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein D56406PEA_(—)1_P5 (SEQ ID NO:1302) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 146, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein D56406_PEA_(—)1_P5 (SEQ ID NO:1302) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 146 Amino acid mutations SNP position(s) on amino acid Previouslysequence Alternative amino acid(s) known SNP? 28 M -> V No 42 S -> P No82 V -> No 82 V -> A No

Variant protein D56406_PEA_(—)1_P5 (SEQ ID NO:1302) is encoded by thefollowing transcript(s): D56406_PEA_(—)1_T6 (SEQ ID NO:23), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript D56406_PEA_(—)1_T6 (SEQ ID NO:23) is shown inbold; this coding portion starts at position 106 and ends at position609. The transcript also has the following SNPs as listed in Table 147(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinD56406_PEA_(—)1_P5 (SEQ ID NO:1302) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 147 Nucleic acid SNPs SNP position on nucleotide Previouslysequence Alternative nucleic acid known SNP? 94 G -> T No 95 A -> T No759 T -> G Yes 806 G -> A Yes 1014 T -> G No 1178 T -> G No 103 A -> GYes 187 A -> G No 229 T -> C No 333 T -> C No 350 T -> No 350 T -> C No411 A -> T No 658 T -> No

Variant protein D56406_PEA_(—)1_P6 (SEQ ID NO:1303) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) D56406_PEA_(—)1_T7 (SEQ IDNO:24). An alignment is given to the known protein(Neurotensin/neuromedin N precursor) at the end of the application. Oneor more alignments to one or more previously published protein sequencesare given at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between D56406_PEA_(—)1_P6 (SEQ ID NO:1303) andNEUT_HUMAN (SEQ ID NO:1422):

1. An isolated chimeric polypeptide encoding for D56406_PEA_(—)1_P6 (SEQID NO:1303), comprising a first amino acid sequence being at least 90%homologous to MMAGMKIQLVCMLLLAFSSWSLCSDSEEEMKALEADFLTNMHTSKcorresponding to amino acids 1-45 of NEUT_HUMAN (SEQ ID NO:1422), whichalso corresponds to amino acids 1-45 of D56406_PEA_(—)1_P6 (SEQ IDNO:1303), and a second amino acid sequence being at least 90% homologousto LIQEDILDTGNDKNGKEEVIKRKIPYILKRQLYENKPRRPYILKRDSYYY corresponding toamino acids 121-170 of NEUT_HUMAN (SEQ ID NO:1422), which alsocorresponds to amino acids 46-95 of D56406_PEA_(—)1_P6 (SEQ ID NO:1303),wherein said first and second amino acid sequences are contiguous and ina sequential order.

2. An isolated chimeric polypeptide encoding for an edge portion ofD56406_PEA_(—)1_P6 (SEQ ID NO:1303), comprising a polypeptide having alength “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise KL, having a structureas follows: a sequence starting from any of amino acid numbers 45−x to45; and ending at any of amino acid numbers 46+((n−2)−x), in which xvaries from 0 to n−2.

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein D56406_PEA_(—)1_P6 (SEQ ID NO:1303) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 148, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein D56406_PEA_(—)1_P6 (SEQ ID NO:1303) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 148 Amino acid mutations SNP position(s) on amino acid Previouslysequence Alternative amino acid(s) known SNP? 30 M -> V No 44 S -> P No

Variant protein D56406_PEA_(—)1_P6 (SEQ ID NO:1303) is encoded by thefollowing transcript(s): D56406_PEA_(—)1_T7 (SEQ ID NO:24), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript D56406_PEA_(—)1_T7 (SEQ ID NO:24) is shown inbold; this coding portion starts at position 106 and ends at position390. The transcript also has the following SNPs as listed in Table 149(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinD56406_PEA_(—)1_P6 (SEQ ID NO:1303) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 149 Nucleic acid SNPs SNP position on nucleotide Previouslysequence Alternative nucleic acid known SNP? 94 G -> T No 95 A -> T No103 A -> G Yes 193 A -> G No 235 T -> C No 439 T -> No 540 T -> G Yes587 G -> A Yes 795 T -> G No 959 T -> G No

As noted above, cluster D56406 features 10 segment(s), which were listedin Table 142 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster D56406PEA_(—)1_node_(—)0 (SEQ ID NO:1135) according tothe present invention is supported by 48 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): D56406_PEA_(—)1_T3 (SEQ ID NO:22),D56406_PEA_(—)1_T6 (SEQ ID NO:23) and D56406_PEA_(—)1_T7 (SEQ ID NO:24).Table 150 below describes the starting and ending position of thissegment on each transcript.

TABLE 150 Segment location on transcripts Segment Transcript namestarting position Segment ending position D56406_PEA_1_T3 1 178 (SEQ IDNO: 22) D56406_PEA_1_T6 1 178 (SEQ ID NO: 23) D56406_PEA_1_T7 1 178 (SEQID NO: 24)

Microarray (chip) data is also available for this segment as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotides were found to hit this segment (with regard to lungcancer), shown in Table 151.

TABLE 151 Oligonucleotides related to this segment Oligonucleotide nameOverexpressed in cancers Chip reference D56406_0_5_0 lung malignanttumors LUN (SEQ ID NO: 210)

Segment cluster D56406_PEA_(—)1_node_(—)13 according to the presentinvention is supported by 43 libraries. The number of libraries wasdetermined as previously described. This segment can be found in thefollowing transcript(s): D56406_PEA_(—)1_T3 (SEQ ID NO:22),D56406_PEA_(—)1_T6 (SEQ ID NO:23) and D56406_PEA_(—)1_T7 (SEQ ID NO:24).Table 152 below describes the starting and ending position of thissegment on each transcript.

TABLE 152 Segment location on transcripts Segment Transcript namestarting position Segment ending position D56406_PEA_1_T3 559 902 (SEQID NO: 22) D56406_PEA_1_T6 460 1239 (SEQ ID NO: 23) D56406_PEA_1_T7 2411020 (SEQ ID NO: 24)

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster D56406_PEA_(—)1_node_(—)11 (SEQ ID NO:1137) according tothe present invention is supported by 1 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): D56406_PEA_(—)1_T3 (SEQ ID NO:22).Table 153 below describes the starting and ending position of thissegment on each transcript.

TABLE 153 Segment location on transcripts Segment Transcript namestarting position Segment ending position D56406_PEA_1_T3 466 558 (SEQID NO: 22)

Segment cluster D56406_PEA_(—)1_node_(—)2 (SEQ ID NO:1138) according tothe present invention can found in the following transcript(s):D56406_PEA_(—)1_T3 (SEQ ID NO:22) and D56406_PEA_(—)1_T7 (SEQ ID NO:24).Table 154 below describes the starting and ending position of thissegment on each transcript.

TABLE 154 Segment location on transcripts Segment Transcript namestarting position Segment ending position D56406_PEA_1_T3 179 184 (SEQID NO: 22) D56406_PEA_1_T7 179 184 (SEQ ID NO: 24)

Segment cluster D56406_PEA_(—)1_node_(—)3 (SEQ ID NO:1139) according tothe present invention is supported by 46 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): D56406_PEA_(—)1_T3 (SEQ ID NO:22),D56406_PEA_(—)1_T6 (SEQ ID NO:23) and D56406_PEA_(—)1_T7 (SEQ ID NO:24).Table 155 below describes the starting and ending position of thissegment on each transcript.

TABLE 155 Segment location on transcripts Segment Transcript namestarting position Segment ending position D56406_PEA_1_T3 185 240 (SEQID NO: 22) D56406_PEA_1_T6 179 234 (SEQ ID NO: 23) D56406_PEA_1_T7 185240 (SEQ ID NO: 24)

Segment cluster D56406_PEA_(—)1_node_(—)5 (SEQ ID NO:1140) according tothe present invention is supported by 48 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): D56406_PEA_(—)1_T3 (SEQ ID NO:22)and D56406_PEA_(—)1_T6 (SEQ ID NO:23). Table 156 below describes thestarting and ending position of this segment on each transcript.

TABLE 156 Segment location on transcripts Segment Transcript namestarting position Segment ending position D56406_PEA_1_T3 241 355 (SEQID NO: 22) D56406_PEA_1_T6 235 349 (SEQ ID NO: 23)

Segment cluster D56406_PEA_(—)1_node_(—)6 (SEQ ID NO:1141) according tothe present invention is supported by 34 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): D56406_PEA_(—)1_T3 (SEQ ID NO:22)and D56406_PEA_(—)1_T6 (SEQ ID NO:23). Table 157 below describes thestarting and ending position of this segment on each transcript.

TABLE 157 Segment location on transcripts Segment starting Transcriptname position Segment ending position D56406_PEA_1_T3 356 389 (SEQ IDNO: 22) D56406_PEA_1_T6 350 383 (SEQ ID NO: 23)

Segment cluster D56406_PEA_(—)1_node_(—)7 (SEQ ID NO:1142) according tothe present invention is supported by 32 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): D56406_PEA_(—)1 T3 (SEQ ID NO:22)and D56406_PEA_(—)1_T6 (SEQ ID NO:23). Table 158 below describes thestarting and ending position of this segment on each transcript.

TABLE 158 Segment location on transcripts Segment starting Transcriptname position Segment ending position D56406_PEA_1_T3 390 415 (SEQ IDNO: 22) D56406_PEA_1_T6 384 409 (SEQ ID NO: 23)

Segment cluster D56406_PEA_(—)1_node_(—)8 (SEQ ID NO:1143) according tothe present invention can be found in the following transcript(s):D56406_PEA_(—)1_T3 (SEQ ID NO:22) and D56406_PEA_(—)1_T6 (SEQ ID NO:23).Table 159 below describes the starting and ending position of thissegment on each transcript.

TABLE 159 Segment location on transcripts Segment starting Transcriptname position Segment ending position D56406_PEA_1_T3 416 423 (SEQ IDNO: 22) D56406_PEA_1_T6 410 417 (SEQ ID NO: 23)

Segment cluster D56406_PEA_(—)1_node_(—)9 (SEQ ID NO:1144) according tothe present invention is supported by 31 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): D56406_PEA_(—)1_T3 (SEQ ID NO:22)and D56406PEA_(—)1_T6 (SEQ ID NO:23). Table 160 below describes thestarting and ending position of this segment on each transcript.

TABLE 160 Segment location on transcripts Segment starting Transcriptname position Segment ending position D56406_PEA_1_T3 424 465 (SEQ IDNO: 22) D56406_PEA_1_T6 418 459 (SEQ ID NO: 23)Variant protein alignment to the previously known protein:

Sequence name: /tmp/jU49325aMA/8F0XuN7La5:NEUT_HUMAN (SEQ ID NO:1422)Sequence documentation: Alignment of: D56406_PEA_1_P2 (SEQ ID NO:1301) xNEUT_HUMAN (SEQ ID NO:1422) Alignment segment 1/1: Quality: 1591.00Escore: 0 Matching length: 170 Total length: 201 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 84.58 Total Percent Identity: 84.58 Gaps: 1 Alignment:             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

Sequence name: /tmp/wWui8Kd4y9/zbf3ihRwnR:NEUT_HUMAN (SEQ ID NO:1422)Sequence documentation: Alignment of: D56406_PEA_1_P5 (SEQ ID NO:1302) xNEUT_HUMAN (SEQ ID NO:1422) Alignment segment 1/1: Quality: 1572.00Escore: 0 Matching length: 168 Total length: 170 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 98.82 Total Percent Identity: 98.82 Gaps: 1 Alignment:             .         .         .         .         .

             .         .         .         .         .

             .         .         .         .         .

             .         .

Sequence name: /tmp/f5d07fF5D7/E4N5xjUIAN:NEUT_HUMAN (SEQ ID NO:1422)Alignment segment 1/1: Quality: 844.00 Escore: 0 Matching length: 95Total length: 170 Matching Percent Similarity: 100.00 Matching PercentIdentity: 100.00 Total Percent Similarity: 55.88 Total Percent Identity:55.88 Gaps: 1 Alignment:             .         .         .         .         .

             .         .         .         .         .  45.................................................. 45  51VPSWKMTLLNVCSLVNNLNSPAEETGEVHEEELVARRKLPTALDGFSLEA 100             .         .         .         .         .

             .         .

Expression of NTS D56406 Transcripts which are Detectable by Amplicon asDepicted in Sequence Name D56406_seg7-9F2R2 (SEQ ID NO: 1798) in Normaland Cancerous Lung Tissues

Expression of NTS transcripts detectable by or according toseg7-9F2R2—D56406_seg7-9F2R2 amplicon (SEQ ID NO: 1798) and primersD56406_seg7-9F2 (SEQ ID NO: 1796) and D56406_seg7-9R2 (SEQ ID NO: 1797)was measured by real time PCR. In parallel the expression of severalhousekeeping genes—HPRT1 (GenBank Accession No. NM_(—)000194 (SEQ ID NO:1714); amplicon—HPRT1-amplicon (SEQ ID NO: 1297)), PBGD (GenBankAccession No. BC019323 (SEQ ID NO: 1713); amplicon—PBGD-amplicon (SEQ IDNO: 334)), SDHA (GenBank Accession No. NM_(—)004168 (SEQ ID NO: 1712);amplicon—SDHA-amplicon (SEQ ID NO: 331)) and Ubiquitin (GenBankAccession No. BC000449 (SEQ ID NO: 1711); amplicon—Ubiquitin-amplicon(SEQ ID NO: 328)) was measured similarly. For each RT sample, theexpression of the above amplicon was normalized to the normalizationfactor calculated from the expression of these house keeping genes asdescribed in normalization method 2 in the “materials and methods”section. The normalized quantity of each RT sample was then divided bythe median of the quantities of the normal samples (sample numbers51-64, 69 and 70, Table 2_(—)1 above), to obtain a value of foldup-regulation for each sample relative to median of the normal samples.

FIG. 112 is a histogram showing over expression of the above-indicatedNTS transcripts in cancerous Lung samples relative to the normalsamples.

As is evident from FIG. 112, the expression of NTS transcriptsdetectable by the above amplicon in non-small cell carcinoma samples,specifically in squamous cell carcinoma was significantly higher than inthe non-cancerous samples (sample numbers 51-64, 69 and 70, Table 2_(—)1above). Notably an over-expression of at least fold was found in 12 outof 39 non-small cell carcinoma samples and in 8 out of 16 squamous cellcarcinoma samples.

Statistical analysis was applied to verify the significance of theseresults, as described below.

The P value for the difference in the expression levels of NTStranscripts detectable by the above amplicon in Lung non-small cellcarcinoma samples versus the normal tissue samples was determined by Ttest as 1.47e-002. The P value for the difference in the expressionlevels of NTS transcripts detectable by the above amplicon in Lungsquamous cell carcinoma samples versus the normal tissue samples wasdetermined by T test as 1.46e-002.

Threshold of 5 fold over expression was found to differentiate betweennon-small cell carcinoma and normal samples with P value of 8.91e-003 aschecked by exact Fisher test. Threshold of 5 fold over expression wasfound to differentiate between squamous cell carcinoma and normalsamples with P value of 1.22e-003 as checked by exact Fisher test.

The above values demonstrate statistical significance of the results.

Primer pairs are also optionally and preferably encompassed within thepresent invention; for example, for the above experiment, the followingprimer pair was used as a non-limiting illustrative example only of asuitable primer pair: D56406_seg7-9F2 forward primer (SEQ ID NO: 1796);and D56406_seg7-9R2 reverse primer (SEQ ID NO: 1797).

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: D56406_seg7-9F2R2 (SEQID NO: 1798).

Forward Primer (D56406_seg7-9F2 (SEQ ID NO: 1796)):AGCTCCACAAAATCTGTCACAGC Reverse Primer (D56406_seg7-9R2 (SEQ ID NO:1797)): TGATCCGCCCGTCTCG Amplicon (D56406_seg7-9F2R2 (SEQ ID NO: 1798)):AGCTCCACAAAATCTGTCACAGCAGGGCTTTTCAACACTGGGAGGCACGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGACGGGCGGATC A

Expression of NTS D56406 Transcripts which are Detectable by Amplicon asDepicted in Sequence Name D56406_seg7-9F2R2 (SEQ ID NO: 1798) inDifferent Normal Tissues

Expression of NTS transcripts detectable by or according toseg7-9F2R2—D56406_seg7-9F2R2 amplicon (SEQ ID NO: 1798) and primersD56406_seg7-9F2 (SEQ ID NO: 1796) and D56406_seg7-9R2 (SEQ ID NO: 1797)was measured by real time PCR. In parallel the expression of severalhousekeeping genes—SDHA (GenBank Accession No. NM_(—)004168 (SEQ ID NO:1712); amplicon—SDHA-amplicon (SEQ ID NO: 331)), Ubiquitin (GenBankAccession No. BC000449 (SEQ ID NO: 1711); amplicon—Ubiquitin-amplicon(SEQ ID NO: 328)), RPL19 (GenBank Accession No. NM_(—)000981 (SEQ ID NO:1715); RPL19 amplicon (SEQ ID NO: 1630)) and TATA box (GenBank AccessionNo. NM_(—)003194 (SEQ ID NO: 1716); TATA amplicon (SEQ ID NO: 1633)) wasmeasured similarly. For each RT sample, the expression of the aboveamplicon was normalized to the normalization factor calculated from theexpression of these house keeping genes as described in normalizationmethod 2 in the “materials and methods” section. The normalized quantityof each RT sample was then divided by the median of the quantities ofthe lung samples (sample numbers 28, 29 and 30, Table 3_(—)1 above), toobtain a value of relative expression of each sample relative to medianof the lung samples.

Forward Primer (D56406_seg7-9F2) (SEQ ID NO: 1796):AGCTCCACAAAATCTGTCACAGC Reverse Primer (D56406_seg7-9R2) (SEQ ID NO:1797): TGATCCGCCCGTCTCG Amplicon (D56406_seg7-9F2R2) (SEQ ID NO: 1798):AGCTCCACAAAATCTGTCACAGCAGGGCTTTTCAACACTGGGAGGCACGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGACGGGCGGATC A

FIG. 113 shows a histogram showing the expression of NTS D56406transcripts which are detectable by amplicon as depicted in sequencename D56406_seg7-9F2R2 (SEQ ID NO: 1798) in different normal tissues.

Description for Cluster F05068

Cluster F05068 features 3 transcript(s) and 12 segment(s) of interest,the names for which are given in Tables 161 and 162, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 163.

TABLE 161 Transcripts of interest Transcript Name Sequence ID No.F05068_PEA_1_T3 25 F05068_PEA_1_T4 26 F05068_PEA_1_T6 27

TABLE 162 Segments of interest Segment Name Sequence ID No.F05068_PEA_1_node_0 350 F05068_PEA_1_node_10 351 F05068_PEA_1_node_12352 F05068_PEA_1_node_13 353 F05068_PEA_1_node_4 354 F05068_PEA_1_node_8355 F05068_PEA_1_node_11 356 F05068_PEA_1_node_3 357 F05068_PEA_1_node_5358 F05068_PEA_1_node_6 359 F05068_PEA_1_node_7 360 F05068_PEA_1_node_9361

TABLE 163 Proteins of interest Protein Name Sequence ID No.F05068_PEA_1_P7 1304 F05068_PEA_1_P8 1305

These sequences are variants of the known protein ADM precursor[Contains: Adrenomedullin (AM); Proadrenomedullin N-20 terminal peptide(ProAM-N20) (ProAM N-terminal 20 peptide) (PAMP)] (SwissProt accessionidentifier ADML_HUMAN), SEQ ID NO:1423, referred to herein as thepreviously known protein.

Protein ADM precursor is known or believed to have the followingfunction(s): AM and PAMP are potent hypotensive and vasodilatatoragents. Numerous actions have been reported, most related to thephysiologic control of fluid and electrolyte homeostasis. In the kidney,AM is diuretic and natriuretic, and both AM and PAMP inhibit aldosteronesecretion by direct adrenal actions. In pituitary gland, both peptidesat physiologically relevant doses inhibit basal ACTH secretion. Bothpeptides appear to act in brain and pituitary gland to facilitate theloss of plasma volume, actions which complement their hypotensiveeffects in blood vessels. The sequence for protein ADM precursor isgiven at the end of the application, as “ADM precursor [Contains:Adrenomedullin (AM); Proadrenomedullin N-20 terminal peptide (ProAM-N20)(ProAM N-terminal 20 peptide) (PAMP)] amino acid sequence”. Knownpolymorphisms for this sequence are as shown in Table 164.

TABLE 164 Amino acid mutations for Known Protein SNP position(s) onamino acid sequence Comment 50 S -> R (in dbSNP: 5005). /FTId =VAR_014861.

Protein ADM precursor localization is believed to be Secreted.

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: cAMP biosynthesis; progesteronebiosynthesis; signal transduction; cell-cell signaling; pregnancy;

excretion; circulation; response to wounding, which are annotation(s)related to Biological Process; ligand; hormone, which are annotation(s)related to Molecular Function; and extracellular space; solublefraction, which are annotation(s) related to Cellular Component.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

Cluster F05068 can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 21 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 21 and Table 165. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions:uterine malignancies.

TABLE 165 Normal tissue distribution Name of Tissue Number bladder 164bone 259 brain 26 colon 66 epithelial 73 general 67 head and neck 0kidney 49 liver 0 lung 51 lymph nodes 0 breast 87 ovary 0 pancreas 30skin 295 stomach 0 Thyroid 0 uterus 13

TABLE 166 P values and ratios for expression in cancerous tissue Name ofTissue P1 P2 SP1 R3 SP2 R4 bladder 7.6e−01 8.0e−01 9.4e−01 0.5 9.9e−010.4 bone 7.5e−01 8.8e−01 1 0.1 1 0.3 brain 5.2e−01 6.1e−01 7.0e−04 2.11.1e−02 1.4 colon 6.2e−01 6.1e−01 9.7e−01 0.5 9.6e−01 0.6 epithelial1.0e−01 3.0e−02 7.8e−01 0.7 5.8e−01 0.9 general 3.7e−01 2.6e−01 8.5e−010.8 9.0e−01 0.8 head and neck 2.1e−01 1.1e−01 1 1.0 3.2e−01 2.3 kidney3.8e−01 3.9e−01 6.6e−02 1.8 1.2e−02 2.2 liver 1.8e−01 1.2e−01 2.3e−014.3 2.3e−01 2.6 lung 6.2e−01 4.3e−01 8.5e−01 0.7 3.8e−01 1.0 lymph nodes1 3.1e−01 1 1.0 1 1.3 breast 7.8e−01 5.8e−01 9.1e−01 0.6 8.9e−01 0.7ovary 3.8e−01 2.6e−01 3.2e−01 2.4 1.6e−01 2.5 pancreas 5.1e−01 3.3e−017.0e−01 0.9 1.0e−01 1.4 skin 6.0e−01 5.2e−01 9.7e−01 0.3 1 0.1 stomach3.6e−01 3.0e−01 1 1.0 4.1e−01 1.8 Thyroid 5.0e−01 5.0e−01 6.7e−01 1.76.7e−01 1.7 uterus 1.1e−01 2.6e−01 2.1e−03 3.2 2.3e−02 2.2

As noted above, cluster F05068 features 3 transcript(s), which werelisted in Table 161 above. These transcript(s) encode for protein(s)which are variant(s) of protein ADM precursor. A description of eachvariant protein according to the present invention is now provided.

Variant protein F05068_PEA_(—)1_P7 (SEQ ID NO:1304) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) F05068_PEA_(—)1_T3 (SEQ IDNO:25) and F05068_PEA_(—)1_T6 (SEQ ID NO:27). An alignment is given tothe known protein (ADM precursor) at the end of the application. One ormore alignments to one or more previously published protein sequencesare given at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between F05068_PEA_(—)1_P7 (SEQ ID NO:1304) andADML_HUMAN (SEQ ID NO:1423):

1. An isolated chimeric polypeptide encoding for F05068_PEA_(—)1_P7 (SEQID NO:1304), comprising a first amino acid sequence being at least 90%homologous to MKLVSVALMYLGSLAFLGADTARLDVASEFRKK corresponding to aminoacids 1-33 of ADML_HUMAN (SEQ ID NO:1423), which also corresponds toamino acids 1-33 of F05068_PEA_(—)1_P7 (SEQ ID NO:1304).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein F05068_PEA_(—)1_P7 (SEQ ID NO:1304) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 167, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein F05068_PEA_(—)1_P7 (SEQ ID NO:1304) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 167 Amino acid mutations SNP position(s) on amino acid Alternativesequence amino acid(s) Previously known SNP? 4 V -> F No 10 Y -> C No

Variant protein F05068PEA_(—)1_P7 (SEQ ID NO:1304) is encoded by thefollowing transcript(s): F05068_PEA_(—)1_T3 (SEQ ID NO:25) andF05068_PEA_(—)1_T6 (SEQ ID NO:27), for which the sequence(s) is/aregiven at the end of the application.

The coding portion of transcript F05068_PEA_(—)1_T3 (SEQ ID NO:25) isshown in bold; this coding portion starts at position 267 and ends atposition 365. The transcript also has the following SNPs as listed inTable 168 (given according to their position on the nucleotide sequence,with the alternative nucleic acid listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein F05068_PEA_(—)1_P7 (SEQ ID NO:1304) sequence provides supportfor the deduced sequence of this variant protein according to thepresent invention).

TABLE 168 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 26 C -> T Yes 164 T -> No593 G -> C Yes 860 C -> No 860 C -> A No 1022 G -> A No 1023 G -> A No1023 G -> C Yes 1084 G -> A Yes 1088 C -> No 1088 C -> A No 1106 C -> No177 T -> No 1106 C -> A No 1149 G -> No 1154 C -> No 1171 T -> G Yes1192 G -> No 1224 C -> No 1266 C -> No 1282 C -> T No 1381 G -> A No1450 T -> No 206 C -> T Yes 1457 T -> G No 1534 C -> No 1535 C -> No1554 A -> G Yes 1572 A -> C No 1572 A -> G No 1655 A -> C Yes 1669 T ->C Yes 1721 C -> T No 245 G -> No 259 C -> No 276 G -> T No 295 A -> G No317 A -> C Yes 566 C -> G Yes

The coding portion of transcript F05068_PEA_(—)1_T6 (SEQ ID NO:27) isshown in bold; this coding portion starts at position 267 and ends atposition 365. The transcript also has the following SNPs as listed inTable 169 (given according to their position on the nucleotide sequence,with the alternative nucleic acid listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein F05068_PEA_(—)1_P7 (SEQ ID NO:1304) sequence provides supportfor the deduced sequence of this variant protein according to thepresent invention).

TABLE 169 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 26 C -> T Yes 164 T -> No593 G -> C Yes 739 C -> G Yes 1093 C -> No 1093 C -> A No 1255 G -> A No1256 G -> A No 1256 G -> C Yes 1317 G -> A Yes 1321 C -> No 1321 C -> ANo 177 T -> No 1339 C -> No 1339 C -> A No 1382 G -> No 1387 C -> No1404 T -> G Yes 1425 G -> No 1457 C -> No 1499 C -> No 1515 C -> T No1614 G -> A No 206 C -> T Yes 1683 T -> No 1690 T -> G No 1767 C -> No1768 C -> No 1787 A -> G Yes 1805 A -> C No 1805 A -> G No 1888 A -> CYes 1902 T -> C Yes 1954 C -> T No 245 G -> No 259 C -> No 276 G -> T No295 A -> G No 317 A -> C Yes 566 C -> G Yes

Variant protein F05068_PEA_(—)1_P8 (SEQ ID NO:1305) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) F05068_PEA_(—)1_T4 (SEQ IDNO:26). An alignment is given to the known protein (ADM precursor) atthe end of the application. One or more alignments to one or morepreviously published protein sequences are given at the end of theapplication. A brief description of the relationship of the variantprotein according to the present invention to each such aligned proteinis as follows:

Comparison report between F05068_PEA_(—)1_P8 (SEQ ID NO:1305) andADML_HUMAN (SEQ ID NO:1423):

1. An isolated chimeric polypeptide encoding for F05068_PEA_(—)1_P8 (SEQID NO:1305), comprising a first amino acid sequence being at least 90%homologous toMKLVSVALMYLGSLAFLGADTARLDVASEFRKKWNKWALSRGKRELRMSSSYPTGLADVKAGPAQTLIRPQDMKGASRSPED corresponding to amino acids 1-82 of ADML_HUMAN (SEQ IDNO:1423), which also corresponds to amino acids 1-82 ofF05068_PEA_(—)1_P8 (SEQ ID NO:1305), and a second amino acid being atleast 70%, optionally at least 80%, preferably at least 85%, morepreferably at least 90% and most preferably at least 95% homologous to apolypeptide having the sequence R corresponding to amino acids 83-83 ofF05068_PEA_(—)1_P8 (SEQ ID NO:1305), wherein said first and second aminoacid sequences are contiguous and in a sequential order.

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein F05068_PEA_(—)1_P8 (SEQ ID NO:1305) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 170, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein F05068PEA_(—)1_P8 (SEQ ID NO:1305) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 170 Amino acid mutations SNP position(s) on amino acid Alternativesequence amino acid(s) Previously known SNP? 4 V -> F No 50 S -> R Yes10 Y -> C No

Variant protein F05068_PEA_(—)1_P8 (SEQ ID NO:1305) is encoded by thefollowing transcript(s): F05068_PEA_(—)1_T4 (SEQ ID NO:26), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript F05068_PEA_(—)1_T4 (SEQ ID NO:26) is shown inbold; this coding portion position 267 and ends at position 515. Thetranscript also has the following SNPs as listed in Table 171 (givenaccording to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinF05068_PEA_(—)1_P8 (SEQ ID NO:1305) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 171 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 26 C -> T Yes 164 T -> No443 G -> C Yes 589 C -> G Yes 943 C -> No 943 C -> A No 1105 G -> A No1106 G -> A No 1106 G -> C Yes 1167 G -> A Yes 1171 C -> No 1171 C -> ANo 177 T -> No 1189 C -> No 1189 C -> A No 1232 G -> No 1237 C -> No1254 T -> G Yes 1275 G -> No 1307 C -> No 1349 C -> No 1365 C -> T No1464 G -> A No 206 C -> T Yes 1533 T -> No 1540 T -> G No 1617 C -> No1618 C -> No 1637 A -> G Yes 1655 A -> C No 1655 A -> G No 1738 A -> CYes 1752 T -> C Yes 1804 C -> T No 245 G -> No 259 C -> No 276 G -> T No295 A -> G No 317 A -> C Yes 416 C -> G Yes

As noted above, cluster F05068 features 12 segment(s), which were listedin Table 162 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster F05068_PEA_(—)1_node_(—)0 (SEQ ID NO:1145) according tothe present invention is supported by 143 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): F05068_PEA_(—)1_T3 (SEQ ID NO:25),F05068_PEA_(—)1_T4 (SEQ ID NO:26) and F05068_PEA_(—)1_T6 (SEQ ID NO:27).Table 172 below describes the starting and ending position of thissegment on each transcript.

TABLE 172 Segment location on transcripts Segment Transcript namestarting position Segment ending position F05068_PEA_1_T3 1 245 (SEQ IDNO: 25) F05068_PEA_1_T4 1 245 (SEQ ID NO: 26) F05068_PEA_1_T6 1 245 (SEQID NO: 27)

Segment cluster F05068_PEA_(—)1_node_(—)10 (SEQ ID NO:1146) according tothe present invention is supported by 127 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): F05068_PEA_(—)1_T3 (SEQ ID NO:25),F05068_PEA_(—)1_T4 (SEQ ID NO:26) and F05068_PEA_(—)1_T6 (SEQ ID NO:27).Table 173 below describes the starting and ending position segment oneach transcript.

TABLE 173 Segment location on transcripts Segment Transcript namestarting position Segment ending position F05068_PEA_1_T3 749 909 (SEQID NO: 25) F05068_PEA_1_T4 832 992 (SEQ ID NO: 26) F05068_PEA_1_T6 9821142 (SEQ ID NO: 27)

Segment cluster F05068_PEA_(—)1_node_(—)12 (SEQ ID NO:1147) according tothe present invention is supported by 123 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): F05068_PEA_(—)1_T3 (SEQ ID NO:25),F05068_PEA_(—)1_T4 (SEQ ID NO:26) and F05068_PEA_(—)1_T6 (SEQ ID NO:27).Table 174 below describes the starting and ending position of thissegment on each transcript.

TABLE 174 Segment location on transcripts Segment Transcript namestarting position Segment ending position F05068_PEA_1_T3 986 1106 (SEQID NO: 25) F05068_PEA_1_T4 1069 1189 (SEQ ID NO: 26) F05068_PEA_1_T61219 1339 (SEQ ID NO: 27)

Segment cluster F05068_PEA_(—)1_node_(—)13 (SEQ ID NO:1148) according tothe present invention is supported by 181 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): F05068_PEA_(—)1_T3 (SEQ ID NO:25),F05068_PEA_(—)1_T4 (SEQ ID NO:26) and F05068_PEA_(—)1_T6 (SEQ ID NO:27).Table 175 below describes the starting and ending position of thissegment on each transcript.

TABLE 175 Segment location on transcripts Segment Transcript namestarting position Segment ending position F05068_PEA_1_T3 1107 1737 (SEQID NO: 25) F05068_PEA_1_T4 1190 1820 (SEQ ID NO: 26) F05068_PEA_1_T61340 1970 (SEQ ID NO: 27)

Segment cluster F05068_PEA_(—)1_node_(—)4 (SEQ ID NO:1149) according tothe present invention is supported by 15 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): F05068_PEA_(—)1_T3 (SEQ ID NO:25)and F05068_PEA_(—)1_T6 (SEQ ID NO:27). Table 176 below describes thestarting and ending position of this segment on each transcript.

TABLE 176 Segment location on transcripts Segment starting Segmentending Transcript name position position F05068_PEA_1_T3 (SEQ ID NO: 25)365 514 F05068_PEA_1_T6 (SEQ ID NO: 27) 365 514

Segment cluster F05068_PEA_(—)1_node_(—)8 (SEQ ID NO:1150) according tothe present invention is supported by 13 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): F05068_PEA_(—)1_T4 (SEQ ID NO:26)and F05068_PEA_(—)1_T6 (SEQ ID NO:27). Table 177 below describes thestarting and ending position of this segment on each transcript.

TABLE 177 Segment location on transcripts Segment starting Segmentending Transcript name position position F05068_PEA_1_T4 (SEQ ID NO: 26)515 747 F05068_PEA_1_T6 (SEQ ID NO: 27) 665 897

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster F05068_PEA_(—)1_node_(—)11 (SEQ ID NO:1151) according tothe present invention is supported by 112 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): F05068_PEA_(—)1_T3 (SEQ ID NO:25),F05068_PEA_(—)1_T4 (SEQ ID NO:26) and F05068_PEA_(—)1_T6 (SEQ ID NO:27).Table 178 below describes the starting and ending position of thissegment on each transcript.

TABLE 178 SegmentSegment location on transcripts Segment startingSegment ending Transcript name position position F05068_PEA_1_T3 (SEQ IDNO: 25) 910 985 F05068_PEA_1_T4 (SEQ ID NO: 26) 993 1068 F05068_PEA_1_T6(SEQ ID NO: 27) 1143 1218

Segment cluster F05068_PEA_(—)1_node_(—)3 (SEQ ID NO:1152) according tothe present invention is supported by 145 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): F05068_PEA_(—)1_T3 (SEQ ID NO:25),F05068_PEA_(—)1_T4 (SEQ ID NO:26) and F05068_PEA_(—)1_T6 (SEQ ID NO:27).Table 179 below describes the starting and ending position of thissegment on each transcript.

TABLE 179 Segment location on transcripts Segment starting Segmentending Transcript name position position F05068_PEA_1_T3 (SEQ ID NO: 25)246 364 F05068_PEA_1_T4 (SEQ ID NO: 26) 246 364 F05068_PEA_1_T6 (SEQ IDNO: 27) 246 364

Segment cluster F05068_PEA_(—)1_node_(—)5 (SEQ ID NO:1153) according tothe present invention is supported by 124 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): F05068_PEA_(—)1_T3 (SEQ ID NO:25),F05068_PEA_(—)1_T4 (SEQ ID NO:26) and F05068_PEA_(—)1T6 (SEQ ID NO:27).Table 180 below describes the starting and ending position of thissegment on each transcript.

TABLE 180 Segment location on transcripts Segment starting Segmentending Transcript name position position F05068_PEA_1_T3 (SEQ ID NO: 25)515 573 F05068_PEA_1_T4 (SEQ ID NO: 26) 365 423 F05068_PEA_1_T6 (SEQ IDNO: 27) 515 573

Segment cluster F05068_PEA_(—)1_node_(—)6 (SEQ ID NO:1154) according tothe present invention is supported by 110 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): F05068_PEA_(—)1_T3 (SEQ ID NO:25),F05068_PEA_(—)1_T4 (SEQ ID NO:26) and F05068_PEA_(—)1_T6 (SEQ ID NO:27).Table 181 below describes the starting and ending position of thissegment on each transcript.

TABLE 181 Segment location on transcripts Segment starting Segmentending Transcript name position position F05068_PEA_1_T3 (SEQ ID NO: 25)574 613 F05068_PEA_1_T4 (SEQ ID NO: 26) 424 463 F05068_PEA_1_T6 (SEQ IDNO: 27) 574 613

Segment cluster F05068_PEA_(—)1_node_(—)7 (SEQ ID NO:1155) according tothe present invention is supported by 109 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): F05068_PEA_(—)1_T3 (SEQ ID NO:25),F05068_PEA_(—)1_T4 (SEQ ID NO:26) and F05068_PEA_(—)1_T6 (SEQ ID NO:27).Table 182 below describes the starting and ending position of thissegment on each transcript.

TABLE 182 Segment location on transcripts Segment starting Segmentending Transcript name position position F05068_PEA_1_T3 (SEQ ID NO: 25)614 664 F05068_PEA_1_T4 (SEQ ID NO: 26) 464 514 F05068_PEA_1_T6 (SEQ IDNO: 27) 614 664

Segment cluster F05068_PEA_(—)1_node_(—)9 (SEQ ID NO:1156) according tothe present invention is supported by 114 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): F05068_PEA_(—)1_T3 (SEQ ID NO:25),F05068_PEA_(—)1_T4 (SEQ ID NO:26) and F05068_PEA_(—)1_T6 (SEQ ID NO:27).Table 183 below describes the starting and ending position of thissegment on each transcript.

TABLE 183 Segment location on transcripts Segment starting Segmentending Transcript name position position F05068_PEA_1_T3 (SEQ ID NO: 25)665 748 F05068_PEA_1_T4 (SEQ ID NO: 26) 748 831 F05068_PEA_1_T6 (SEQ IDNO: 27) 898 981Variant protein alignment to the previously known protein:

Sequence name: /tmp/kEsi3RWsCN/lsvdhjfiNV:ADML_HUMAN (SEQ ID NO:1423)Sequence documentation: Alignment of: F05068_PEA_1_P7 (SEQ ID NO:1304) xADML_HUMAN (SEQ ID NO:1423) Alignment segment 1/1: Quality: 304.00Escore: 0 Matching length: 33 Total length: 33 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:             .         .         .

Sequence name: /tmp/tcrlWIx4kg/aghbr8Eh8n:ADML_HUMAN (SEQ ID NO:1423)Sequence documentation: Alignment of: F05068_PEA_1_P8 (SEQ ID NO:1305) xADML_HUMAN (SEQ ID NO:1423) Alignment segment 1/1: Quality: 791.00Escore: 0 Matching length: 82 Total length: 82 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Matching PercentSimilarity: 100.00 Total Percent IDentity: 100.00 Gaps: 0 Alignment:             .         .         .         .         .

             .         .         .

Description for Cluster H14624

Cluster H14624 features 1 transcript(s) and 15 segment(s) of interest,the names for which are given in Tables 184 and 185, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 186.

TABLE 184 Transcripts of interest Transcript Name Sequence ID No.H14624_T20 28

TABLE 185 Segments of interest Segment Name Sequence ID No.H14624_node_0 362 H14624_node_16 363 H14624_node_3 364 H14624_node_10365 H14624_node_11 366 H14624_node_12 367 H14624_node_13 368H14624_node_14 370 H14624_node_15 371 H14624_node_4 372 H14624_node_5373 H14624_node_6 374 H14624_node_7 375 H14624_node_8 376 H14624_node_9377

TABLE 186 Proteins of interest Protein Name Sequence ID No. H14624_P151306

Cluster H14624 can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 22 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 22 and Table 187. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions:colorectal cancer, epithelial malignant tumors, a mixture of malignanttumors from different tissues, lung malignant tumors and pancreascarcinoma.

TABLE 187 Normal tissue distribution Name of Tissue Number adrenal 0bladder 410 bone 71 brain 42 colon 6 epithelial 91 general 74 head andneck 0 kidney 0 lung 30 breast 949 ovary 7 pancreas 2 prostate 94stomach 3 Thyroid 128 uterus 54

TABLE 188 P values and ratios for expression in cancerous tissue Name ofTissue P1 P2 SP1 R3 SP2 R4 adrenal 4.2e−01 4.6e−01 4.6e−01 2.2 5.3e−011.9 bladder 5.4e−01 6.0e−01 1.2e−02 1.6 2.2e−01 1.0 bone 4.9e−01 8.5e−011.8e−01 1.3 7.5e−01 0.6 brain 4.7e−01 7.0e−01 6.3e−05 2.3 9.4e−03 1.4colon 4.4e−02 9.9e−02 4.5e−03 5.4 2.0e−02 3.9 epithelial 7.7e−03 3.6e−011.5e−11 2.0 2.9e−02 1.1 general 5.1e−03 5.9e−01 8.3e−21 2.2 1.5e−04 1.2head and neck 1.4e−01 2.8e−01 4.6e−01 2.2 7.5e−01 1.3 kidney 6.5e−017.2e−01 5.8e−01 1.7 7.0e−01 1.4 lung 6.1e−02 1.4e−01 3.3e−05 5.8 8.1e−032.9 breast 2.4e−01 4.1e−01 1 0.3 1 0.2 ovary 8.5e−01 7.3e−01 6.8e−01 1.21.6e−01 1.6 pancreas 7.5e−03 4.9e−02 1.2e−21 22.4 2.4e−16 15.1 prostate8.3e−01 8.9e−01 7.2e−01 0.8 8.8e−01 0.6 stomach 4.6e−01 8.5e−01 1.0e−032.7 1.1e−01 1.4 Thyroid 7.0e−01 7.0e−01 5.9e−01 1.0 5.9e−01 1.0 uterus4.1e−01 7.3e−01 2.3e−01 1.2 6.2e−01 0.7

As noted above, contig H14624 features 1 transcript(s), which werelisted in Table 184 above. A description of each variant proteinaccording to the present invention is now provided.

Variant protein H14624 P15 (SEQ ID NO:1306) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) H14624_T20 (SEQ ID NO:28).One or more alignments to one or more previously published proteinsequences are given at the end of the application. A brief descriptionof the relationship of the variant protein according to the presentinvention to each such aligned protein is as follows:

Comparison report between H14624_P15 (SEQ ID NO:1306) and Q9HAP5 (SEQ IDNO:1701):

1. An isolated chimeric polypeptide encoding for H14624 P15 (SEQ IDNO:1306), comprising a first amino acid sequence being at least 90%homologous toMLQGPGSLLLLFLASHCCLGSARGLFLFGQPDFSYKRSNCKPIPANLQLCHGIEYQNMRLPNLLGHETMKEVLEQAGAWIPVMKQCHPDTKKFLCSLFAPVCLDDLDETIQPCHSLCVQVKDRCAPVMSAFGFPWPDMLECDRFPQDNDLCIPLASSDHLLPATEE corresponding to amino acids 1-167 of Q9HAP5(SEQ ID NO:1701), which also corresponds to amino acids 1-167 ofH14624_P15 (SEQ ID NO:1306), and a second amino acid sequence being atleast 70%, optionally at least 80%, preferably at least 85%, morepreferably at least 90% and most preferably at least 95% homologous to apolypeptide having the sequence GKPSLLLPHSLLG (SEQ ID NO: 1765)corresponding to amino acids 168-180 of H14624 P15 (SEQ ID NO:1306),wherein said first and second amino acid sequences are contiguous and ina sequential order.

2. An isolated polypeptide encoding for a tail of H14624_P15 (SEQ IDNO:1306), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence GKPSLLLPHSLLG (SEQ ID NO: 1765) in H14624_P15 (SEQ ID NO:1306).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein H14624 P15 (SEQ ID NO:1306) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table189, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein H14624_P15 (SEQ ID NO:1306) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 189 Amino acid mutations SNP position(s) on amino Alternativeamino acid sequence acid(s) Previously known SNP? 11 L -> No 170 P -> SYes 28 F -> No 29 G -> No 38 S -> No 45 A -> V Yes 60 L -> No

Variant protein H14624_P15 (SEQ ID NO:1306) is encoded by the followingtranscript(s): H14624_T20 (SEQ ID NO:28), for which the sequence(s)is/are given at the end of the application. The coding portion oftranscript H14624_T20 (SEQ ID NO:28) is shown in bold; this codingportion starts at position 857 and ends at position 1396. The transcriptalso has the following SNPs as listed in Table 190 (given according totheir position on the nucleotide sequence, with the alternative nucleicacid listed; the last column indicates whether the SNP is known or not;the presence of known SNPs in variant protein H14624_P15 (SEQ IDNO:1306) sequence provides support for the deduced sequence of thisvariant protein according to the present invention).

TABLE 190 Nucleic acid SNPs SNP position on Alternative nucleicnucleotide sequence acid Previously known SNP? 389 A -> G No 476 C -> TNo 969 G -> No 988 G -> T Yes 990 C -> T Yes 1034 C -> No 1168 C -> TYes 1364 C -> T Yes 488 T -> C No 819 C -> G Yes 851 C -> No 887 C -> No922 G -> A Yes 934 C -> T Yes 938 T -> No 943 C -> No

As noted above, cluster H14624 features 15 segment(s), which were listedin Table 185 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster H14624_node_(—)0 (SEQ ID NO:1157) according to thepresent invention is supported by 3 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): H14624_T20 (SEQ ID NO:28). Table 191 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 191 Segment location on transcripts Segment starting Segmentending Transcript name position position H14624_T20 (SEQ ID NO: 28) 1573

Segment cluster H14624_node_(—)16 (SEQ ID NO:1158) according to thepresent invention is supported by 3 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): H14624_T20 (SEQ ID NO:28). Table 192 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 192 Segment location on transcripts Segment starting Segmentending Transcript name position position H14624_T20 (SEQ ID NO: 28) 13591745

Segment cluster H14624_node_(—)3 (SEQ ID NO:1159) according to thepresent invention is supported by 67 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): H14624_T20 (SEQ ID NO:28). Table 193 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 193 Segment location on transcripts Segment starting Segmentending Transcript name position position H14624_T20 (SEQ ID NO: 28) 574822

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster H14624_node_(—)10 (SEQ ID NO:1160) according to thepresent invention can be found in the following transcript(s):H14624_T20 (SEQ ID NO:28). Table 194 below describes the starting andending position of this segment on each transcript.

TABLE 194 Segment location on transcripts Segment starting Segmentending Transcript name position position H14624_T20 (SEQ ID NO: 28) 10701079

Segment cluster H14624_node_(—)11 (SEQ ID NO:1161) according to thepresent invention is supported by 99 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): H14624_T20 (SEQ ID NO:28). Table 195 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 195 Segment location on transcripts Segment starting Segmentending Transcript name position position H14624_T20 (SEQ ID NO: 28) 10801114

Segment cluster H14624_node_(—)12 (SEQ ID NO:1162) according to thepresent invention can be found in the following transcript(s):H14624_T20 (SEQ ID NO:28). Table 196 below describes the starting andending position of this segment on each transcript.

TABLE 196 Segment location on transcripts Segment starting Segmentending Transcript name position position H14624_T20 (SEQ ID NO: 28) 11151135

Segment cluster H14624_node_(—)13 (SEQ ID NO:1163) according to thepresent invention is supported by 124 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): H14624_T20 (SEQ ID NO:28). Table 197 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 197 Segment location on transcripts Segment starting Segmentending Transcript name position position H14624_T20 (SEQ ID NO: 28) 11361227

Segment cluster H14624_node_(—)14 (SEQ ID NO:1164) according to thepresent invention is supported by 114 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): H14624_T20 (SEQ ID NO:28). Table 198 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 198 Segment location on transcripts Segment starting Segmentending Transcript name position position H14624_T20 (SEQ ID NO: 28) 12281287

Segment cluster H14624_node_(—)15 (SEQ ID NO:1165) according to thepresent invention is supported by 124 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): H14624_T20 (SEQ ID NO:28). Table 199 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 199 Segment location on transcripts Segment starting Segmentending Transcript name position position H14624_T20 (SEQ ID NO: 28) 12881358

Segment cluster H14624_node_(—)4 (SEQ ID NO:1166) according to thepresent invention is supported by 65 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): H14624_T20 (SEQ ID NO:28). Table 200 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 200 Segment location on transcripts Segment starting Segmentending Transcript name position position H14624_T20 (SEQ ID NO: 28) 823892

Segment cluster H14624_node_(—)5 (SEQ ID NO:1167) according to thepresent invention can be found in the following transcript(s):H14624_T20 (SEQ ID NO:28). Table 201 below describes the starting andending position of this segment on each transcript.

TABLE 201 Segment location on transcripts Segment starting Segmentending Transcript name position position H14624_T20 (SEQ ID NO: 28) 893903

Segment cluster H14624_node_(—)6 (SEQ ID NO:1168) according to thepresent invention can be found in the following transcript(s):H14624_T20 (SEQ ID NO:28). Table 202 below describes the starting andending position of this segment on each transcript.

TABLE 202 Segment location on transcripts Segment starting Segmentending Transcript name position position H14624_T20 (SEQ ID NO: 28) 904927

Segment cluster H14624_node_(—)7 (SEQ ID NO:1169) according to thepresent invention can be found in the following transcript(s):H14624_T20 (SEQ ID NO:28). Table 203 below describes the starting andending position of this segment on each,transcript.

TABLE 203 Segment location on transcripts Segment starting Segmentending Transcript name position position H14624_T20 (SEQ ID NO: 28) 928934

Segment cluster H14624_node_(—)8 (SEQ ID NO:1170) according to thepresent invention is supported by 85 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): H14624_T20 (SEQ ID NO:28). Table 204 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 204 Segment location on transcripts Segment starting Segmentending Transcript name position position H14624_T20 (SEQ ID NO: 28) 9351014

Segment cluster H14624_node_(—)9 (SEQ ID NO:1171) according to thepresent invention is supported by 87 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): H14624_T20 (SEQ ID NO:28). Table 205 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 205 Segment location on transcripts Segment starting Segmentending Transcript name position position H14624_T20 (SEQ ID NO: 28) 10151069

Variant protein alignment to the previously known protein:

Sequence name: /tmp/Upb1SbFkrj/N4PrGQAB2V:Q9HAP5 (SEQ ID NO: 1701)Sequence documentation: Alignment of: H14624_P15 (SEQ ID NO: 1306) ×Q9HAP5 (SEQ ID NO: 1701) . . . Alignment segment 1/1: Quality: 1702.00Escore: 0 Matching length: 167 Total length: 167 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:

Description for Cluster H38804

Cluster H38804 features 2 transcript(s) and 20 segment(s) of interest,the names for which are given in Tables 206 and 207, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 208.

TABLE 206 Transcripts of interest Transcript name Sequence ID No.H38804_PEA_1_T24 29 H38804_PEA_1_T8 30

TABLE 207 Segments of interest Segment Name Sequence ID No.H38804_PEA_1_node_0 378 H38804_PEA_1_node_1 379 H38804_PEA_1_node_16 380H38804_PEA_1_node_19 381 H38804_PEA_1_node_24 382 H38804_PEA_1_node_25383 H38804_PEA_1_node_28 384 H38804_PEA_1_node_29 385H38804_PEA_1_node_30 386 H38804_PEA_1_node_10 387 H38804_PEA_1_node_12388 H38804_PEA_1_node_13 389 H38804_PEA_1_node_14 390H38804_PEA_1_node_2 391 H38804_PEA_1_node_20 392 H38804_PEA_1_node_23393 H38804_PEA_1_node_26 394 H38804_PEA_1_node_3 395 H38804_PEA_1_node_4396 H38804_PEA_1_node_5 397

TABLE 208 Proteins of interest Segment Name Sequence ID No.H38804_PEA_1_P5 1307 H38804_PEA_1_P17 1308

These sequences are variants of the known protein Mitotic checkpointprotein BUB3 (SwissProt accession identifier BUB3_HUMAN), SEQ IDNO:1424, referred to herein as the previously known protein. ProteinMitotic checkpoint protein BUB3 (SEQ ID NO:1424) is known or believed tohave the following function(s): Required for kinetochore localization ofBUB1. The sequence for protein Mitotic checkpoint protein BUB3 is givenat the end of the application, as “Mitotic checkpoint protein BUB3 aminoacid sequence”. Known polymorphisms for this sequence are as shown inTable 209

TABLE 209 Amino acid mutations for Known Protein SNP position(s) onamino acid sequence Comment 326-327 Missing

Protein Mitotic checkpoint protein BUB3 (SEQ ID NO:1424) localization isbelieved to be Nuclear.

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: mitosis; mitotic checkpoint;mitotic spindle checkpoint; cell proliferation, which are annotation(s)related to Biological Process; and nucleus, which are annotation(s)related to Cellular Component.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

Cluster H38804 can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 23 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 23 and Table 210. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions:transitional cell carcinoma, brain malignant tumors, a mixture ofmalignant tumors from different tissues and gastric carcinoma.

TABLE 210 Normal tissue distribution Name of Tissue Number adrenal 124bladder 0 bone 64 brain 40 colon 75 epithelial 86 general 79 head andneck 334 kidney 69 liver 14 lung 125 lymph nodes 218 breast 263 bonemarrow 62 muscle 27 ovary 109 pancreas 43 prostate 32 skin 53 stomach 0T cells 557 Thyroid 257 uterus 113

TABLE 211 P values and ratios for expression in cancerous tissue Name ofTissue P1 P2 SP1 R3 SP2 R4 adrenal 6.3e−01 5.4e−01 1.8e−01 1.4 5.0e−021.9 bladder 7.0e−02 2.6e−02 3.2e−02 4.9 9.9e−03 6.2 bone 3.7e−01 2.3e−017.9e−01 0.9 3.2e−01 1.6 brain 3.1e−02 4.2e−03 5.3e−01 1.2 1.1e−02 2.1colon 2.4e−01 1.1e−01 2.0e−01 1.7 1.6e−01 1.8 epithelial 1.1e−01 2.2e−021.5e−01 1.2 8.6e−03 1.3 general 2.3e−02 2.3e−04 9.0e−02 1.2 4.7e−05 1.4head and neck 4.4e−01 4.7e−01 9.2e−01 0.6 8.9e−01 0.5 kidney 8.2e−018.4e−01 9.0e−01 0.8 3.5e−01 1.0 liver 8.3e−01 1.5e−01 1 0.8 5.3e−02 2.8lung 6.9e−01 8.1e−01 5.1e−01 1.1 6.0e−01 0.8 lymph nodes 5.1e−01 6.9e−015.0e−01 0.9 9.5e−01 0.5 breast 4.9e−01 4.2e−01 9.7e−01 0.5 9.5e−01 0.5bone marrow 6.7e−01 5.4e−01 1 1.5 3.3e−02 2.6 muscle 8.5e−01 6.1e−01 10.4 6.3e−01 1.0 ovary 3.4e−01 3.3e−01 2.5e−01 1.5 4.7e−01 1.1 pancreas4.3e−01 4.9e−01 6.3e−01 1.0 6.9e−01 0.9 prostate 7.4e−01 6.5e−01 1.5e−011.9 1.0e−01 2.0 skin 6.0e−01 1.7e−01 5.4e−01 1.4 2.7e−02 1.2 stomach4.5e−02 9.9e−03 2.5e−01 3.1 4.3e−02 4.3 T cells 5.0e−01 6.7e−01 1 0.39.8e−01 0.5 Thyroid 5.7e−01 5.7e−01 1 0.4 1 0.4 uterus 5.7e−01 6.7e−019.2e−01 0.6 8.7e−01 0.5

As noted above, cluster H38804 features 2 transcript(s), which werelisted in Table 206 above. These transcript(s) encode for protein(s)which are variant(s) of protein Mitotic checkpoint protein BUB3 (SEQ IDNO:1424). A description of each variant protein according to the presentinvention is now provided. Variant protein H38804_PEA_(—)1_P5 (SEQ IDNO:1307) according to the present invention has an amino acid sequenceas given at the end of the application; it is encoded by transcript(s)H38804_PEA_(—)1_T8 (SEQ ID NO:30). An alignment is given to the knownprotein (Mitotic checkpoint protein BUB3 (SEQ ID NO:1424)) at the end ofthe application. One or more alignments to one or more previouslypublished protein sequences are given at the end of the application. Abrief description of the relationship of the variant protein accordingto the present invention to each such aligned protein is as follows:

Comparison report between H38804_PEA_(—)1_P5 (SEQ ID NO:1307) andBUB3_HUMAN (SEQ ID NO:1424):

1. An isolated chimeric polypeptide encoding for H38804_PEA_(—)1_P5 (SEQID NO:1307), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequenceMGRVRTLAGECSAQAQAQSLLAVVLSAPPSGGTPSARLSVRSPSPRDPWGLWAPVLQ (SEQ ID NO:1766) corresponding to amino acids 1-57 of H38804_PEA_(—)1_P5 (SEQ IDNO:1307), and a second amino acid sequence being at least 90% homologousto

MTGSNEFKLNQPPEDGISSVKFSPNTSQFLLVSSWDTSVRLYDVPANSMRLKYQHTGAVLDCAFYDPTHAWSGGLDHQLKMHDLNTDQENLVGTHDAPIRCVEYCPEVNVMVTGSWDQTVKLWDPRTPCNAGTFSQPEKVYTLSVSGDRLIVGTAGRRVLVWDLRNMGYVQQRRESSLKYQTRCIRAFPNKQGYVLSSIEGRVAVEYLDPSPEVQKKKYAFKCHRLKENNIEQIYPVNAISFHIHNTFATGGSDGFVNIWDPFNKKRLCQFHRYPTSIASLAFSNDGTTLAIASSYMYEMDDTEHPEDGIFIRQVTDAETKPK corresponding to aminoacids 1-324 of BUB3_HUMAN (SEQ ID NO:1424), which also corresponds toamino acids 58-381 of H38804_PEA_(—)1_P5 (SEQ ID NO:1307), wherein saidfirst and second amino acid sequences are contiguous and in a sequentialorder.

2. An isolated polypeptide encoding for a head of H38804_PEA_(—)1_P5(SEQ ID NO:1307), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequenceMGRVRTLAGECSAQAQAQSLLAVVLSAPPSGGTPSARLSVRSPSPRDPWGLWAPVLQ (SEQ ID NO:1766) of H38804_PEA_(—)1_P5 (SEQ ID NO:1307).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseone of the two signal-peptide prediction programs (HMM:Signalpeptide,NN:NO) predicts that this protein has a signal peptide.

Variant protein H38804_PEA_(—)1_P5 (SEQ ID NO:1307) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 212, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein H38804_PEA_(—)1_P5 (SEQ ID NO:1307) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 212 Amino acid mutations SNP position(s) Alternative Previously onamino acid amino known sequence acid(s) SNP? 126 H -> Y No 129 S -> RYes 256 I -> No 256 I -> N No 258 G -> No 266 D -> No 266 D -> E No 266D -> N Yes 296 A -> G No 296 A -> V No 306 F -> C No 314 F -> No 215 R-> K No 361 T -> A No 381 K -> No 217 L -> No 220 D -> No 220 D -> E No245 F -> No 245 F -> V No 248 K -> No 248 K -> Q No

Variant protein H38804_PEA_(—)1_P5 (SEQ ID NO:1307) is encoded by thefollowing transcript(s): H38804_PEA_(—)1_T8 (SEQ ID NO:30), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript H38804_PEA_(—)1_T8 (SEQ ID NO:30) is shown inbold; this coding portion starts at position 475 and ends at position1617. The transcript also has the following SNPs as listed in Table 213(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinH38804_PEA_(—)1_P5 (SEQ ID NO:1307) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 213 Nucleic acid SNPs SNP position Alternative Previously onnucleotide nucleic known sequence acid SNP? 161 C -> No 167 C -> No 1118G -> A No 1123 T -> No 1134 C -> No 1134 C -> A No 1207 T -> No 1207 T-> G No 1216 A -> No 1216 A -> C No 1241 T -> No 1241 T -> A No 167 C ->A No 1248 C -> No 1248 C -> G No 1270 G -> A Yes 1272 C -> No 1272 C ->A No 1361 C -> G No 1361 C -> T No 1391 T -> G No 1414 T -> No 1419 A ->G No 192 T -> No 1555 A -> G No 1615 A -> No 1642 G -> A Yes 1846 T -> CYes 2090 A -> G No 2356 C -> G No 2712 G -> No 2909 T -> C No 2909 T ->G No 3020 T -> G No 208 C -> T Yes 3251 T -> No 3306 T -> No 3307 T -> GNo 3354 T -> No 3521 -> G No 3601 C -> No 3601 C -> G No 3633 T -> No3633 T -> G No 3638 A -> No 849 G -> T No 3638 A -> C No 3674 C -> T Yes3812 T -> G No 3862 G -> A Yes 3864 T -> A No 3865 T -> A No 3990 T -> GNo 4096 T -> G No 4152 G -> A Yes 850 C -> T No 855 C -> T Yes 861 T ->G Yes 1098 T -> C No

Variant protein H38804_PEA_(—)1_P17 (SEQ ID NO:1308) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) H38804_PEA_(—)1_T24 (SEQ IDNO:29). An alignment is given to the known protein (Mitotic checkpointprotein BUB3 (SEQ ID NO:1424)) at the end of the application. One ormore alignments to one or more previously published protein sequencesare given at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between H38804_PEA_(—)1_P17 (SEQ ID NO:1308) andBUB3_HUMAN (SEQ ID NO:1424):

1. An isolated chimeric polypeptide encoding for H38804_PEA_(—)1_P17(SEQ ID NO:1308), comprising a first amino acid sequence being at least70%, optionally at least 80%, preferably at least 85%, more preferablyat least 90% and most preferably at least 95% homologous to apolypeptide having the sequenceMGRVRTLAGECSAQAQAQSLLAVVLSAPPSGGTPSARLSVRSPSPRDPWGLWAPVLQ (SEQ ID NO:1766) corresponding to amino acids 1-57 of H38804_PEA_(—)1_P17 (SEQ IDNO:1308), and a second amino acid sequence being at least 90% homologoustoMTGSNEFKLNQPPEDGISSVKFSPNTSQFLLVSSWDTSVRLYDVPANSMRLKYQHTGAVLDCAFYDPTHAWSGGLDHQLKMHDLNTDQENLVGTHDAPIRCVEYCPEVNVMVTGSWDQTVKLWDPRTPCNAGTFSQPEKVYTLSVSGDRLIVGTAGRRVLVWDLRNMGYVQQRRESSLKYQTRCIRAFPNKQGYVLSSIEGRVAVEYLDPSPEVQKKKYAFKCHRLKENNIEQIYPVNAISFHNIHNTFATGGSDGFVNIWDPFNKKRLCQFHRYPTSIASLAFSNDGTTLAIASSYMYEMDDTEHPEDGIFIRQVTDAETKPKSPCT corresponding to aminoacids 1-328 of BUB3_HUMAN (SEQ ID NO:1424), which also corresponds toamino acids 58-385 of H38804_PEA_(—)1_P17 (SEQ ID NO:1308), wherein saidfirst and second amino acid sequences are contiguous and in a sequentialorder.

2. An isolated polypeptide encoding for a head of H38804_PEA_(—)1_P17(SEQ ID NO:1308), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequenceMGRVRTLAGECSAQAQAQSLLAVVLSAPPSGGTPSARLSVRSPSPRDPWGLWAPVLQ (SEQ ID NO:1766) of H38804_PEA_(—)1_P17 (SEQ ID NO:1308).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseone of the two signal-peptide prediction programs (HMM:Signalpeptide,NN:NO) predicts that this protein has a signal peptide.

Variant protein H38804_PEA_(—)1_P17 (SEQ ID NO:1308) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 214, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein H38804_PEA_(—)1_P17 (SEQ ID NO:1308) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 214 Amino acid mutations SNP position(s) Alternative Previously onamino acid amino known sequence acid(s) SNP? 126 H -> Y No 129 S -> RYes 256 I -> No 256 I -> N No 258 G -> No 266 D -> No 266 D -> E No 266D -> N Yes 296 A -> G No 296 A -> V No 306 F -> C No 314 F -> No 215 R-> K No 361 T -> A No 381 K -> No 217 L -> No 220 D -> No 220 D -> E No245 F -> No 245 F -> V No 248 K -> No 248 K -> Q No

Variant protein H38804_PEA_(—)1_P17 (SEQ ID NO:1308) is encoded by thefollowing transcript(s): H38804_(—l PEA) _(—)1_T24 (SEQ ID NO:29), forwhich the sequence(s) is/are given at the end of the application. Thecoding portion of transcript H38804_PEA_(—)1_T24 (SEQ ID NO:29) is shownin bold; this coding portion stars at position 475 and ends at position1629. The transcript also has the following SNPs as listed in Table 215(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinH38804_PEA_(—)1_P17 (SEQ ID NO:1308) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 215 Nucleic acid SNPs SNP position Alternative Previously onnucleotide nucleic known sequence acid SNP? 161 C -> No 167 C -> No 1118G -> A No 1123 T -> No 1134 C -> No 1134 C -> A No 1207 T -> No 1207 T-> G No 1216 A -> No 1216 A -> C No 1241 T -> No 1241 T -> A No 167 C ->A No 1248 C -> No 1248 C -> G No 1270 G -> A Yes 1272 C -> No 1272 C ->A No 1361 C -> G No 1361 C -> T No 1391 T -> G No 1414 T -> No 1419 A ->G No 192 T -> No 1555 A -> G No 1615 A -> No 1721 G -> No 1918 T -> C No1918 T -> G No 2029 T -> G No 2260 T -> No 2315 T -> No 2316 T -> G No2363 T -> No 208 C -> T Yes 2530 -> G No 2610 C -> No 2610 C -> G No2642 T -> No 2642 T -> G No 2647 A -> No 2647 A -> C No 2683 C -> T Yes2821 T -> G No 2871 G -> A Yes 849 G -> T No 2873 T -> A No 2874 T -> ANo 2999 T -> G No 3105 T -> G No 3161 G -> A Yes 850 C -> T No 855 C ->T Yes 861 T -> G Yes 1098 T -> C No

As noted above, cluster H38804 features 20 segment(s), which were listedin Table 2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster H38804_PEA_(—)1_node_(—)0 (SEQ ID NO:1172) according tothe present invention is supported by 125 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H38804_PEA_(—)1_T24 (SEQ ID NO:29)and H38804_PEA_(—)1_T8 (SEQ ID NO:30). Table 216 below describes thestarting and ending position of this segment on each transcript.

TABLE 216 Segment location on transcripts Segment Segment endingTranscript name starting position position H38804_PEA_1_T24 (SEQ ID NO:29) 1 213 H38804_PEA_1_T8 (SEQ ID NO: 30) 1 213

Segment cluster H38804_PEA_(—)1_node_(—)1 (SEQ ID NO:1173) according tothe present invention is supported by 9 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H38804_PEA_(—)1_T24 (SEQ ID NO:29)and H38804_PEA_(—)1_T8 (SEQ ID NO:30). Table 217 below describes thestarting and ending position of this segment on each transcript.

TABLE 217 Segment location on transcripts Segment Segment endingTranscript name starting position position H38804_PEA_1_T24 (SEQ ID NO:29) 214 645 H38804_PEA_1_T8 (SEQ ID NO: 30) 214 645

Segment cluster H38804_PEA_(—)1_node_(—)16 (SEQ ID NO:1174) according tothe present invention is supported by 214 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H38804_PEA_(—)1_T24 (SEQ ID NO:29)and H38804_PEA_(—)1_T8 (SEQ ID NO:30). Table 218 below describes thestarting and ending position of this segment on each transcript.

TABLE 218 Segment location on transcripts Segment Segment endingTranscript name starting position position H38804_PEA_1_T24 (SEQ ID NO:29) 1063 1221 H38804_PEA_1_T8 (SEQ ID NO: 30) 1063 1221

Segment cluster H38804_PEA_(—)1_node_(—)19 (SEQ ID NO:1175) according tothe present invention is supported by 198 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H38804PEA_(—)1_T24 (SEQ ID NO:29)and H38804_PEA_(—)1_T8 (SEQ ID NO:30). Table 219 below describes thestarting and ending position of this segment on each transcript.

TABLE 219 Segment location on transcripts Segment Segment endingTranscript name starting position position H38804_PEA_1_T24 (SEQ ID NO:29) 1222 1360 H38804_PEA_1_T8 (SEQ ID NO: 30) 1222 1360

Segment cluster H38804_PEA_(—)1_node_(—)24 (SEQ ID NO:1176) according tothe present invention is supported by 180 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H38804_PEA_(—)1_T24 (SEQ ID NO:29)and H38804_PEA_(—)1_T8 (SEQ ID NO:30). Table 220 below describes thestarting and ending position of this segment on each transcript.

TABLE 220 Segment location on transcripts Segment Segment endingTranscript name starting position position H38804_PEA_1_T24 (SEQ ID NO:29) 1421 1616 H38804_PEA_1_T8 (SEQ ID NO: 30) 1421 1616

Segment cluster H38804_PEA_(—)1_node_(—)25 (SEQ ID NO.1177) according tothe present invention is supported by 28 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H38804_PEA_(—)1_T8 (SEQ ID NO:30).Table 221 below describes the starting and ending position of thissegment on each transcript.

TABLE 221 Segment location on transcripts Segment Segment endingTranscript name starting position position H38804_PEA_1_T8 (SEQ ID NO:30) 1617 1969

Segment cluster H38804_PEA_(—)1_node_(—)28 (SEQ ID NO:1178) according tothe present invention is supported by 38 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H38804_PEA_(—)1_T8 (SEQ ID NO:30).Table 222 below describes the staring and ending position of thissegment on each transcript.

TABLE 222 Segment location on transcripts Segment Segment endingTranscript name starting position position H38804_PEA_1_T8 (SEQ ID NO:30) 2018 2607

Segment cluster H38804_PEA_(—)1_node_(—)29 (SEQ ID NO:1179) according tothe present invention is supported by 259 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H38804_PEA_(—)1_T24 (SEQ ID NO:29)and H38804_PEA_(—)1_T8 (SEQ ID NO:30). Table 223 below describes thestarting and ending position of this segment on each transcript.

TABLE 223 Segment location on transcripts Segment Segment endingTranscript name starting position position H38804_PEA_1_T24 (SEQ ID NO:29) 1617 2844 H38804_PEA_1_T8 (SEQ ID NO: 30) 2608 3835

Segment cluster H38804_PEA_(—)1_node_(—)30 (SEQ ID NO:1180) according tothe present invention is supported by 169 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H38804_PEA_(—)1_T24 (SEQ ID NO:29)and H38804_PEA_(—)1_T8 (SEQ ID NO:30). Table 224 below describes thestarting and ending position of this segment on each transcript.

TABLE 224 Segment location on transcripts Segment Segment endingTranscript name starting position position H38804_PEA_1_T24 (SEQ ID NO:29) 2845 3170 H38804_PEA_1_T8 (SEQ ID NO: 30) 3836 4161

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster H38804_PEA_(—)1_node_(—)10 (SEQ ID NO:1181) according tothe present invention is supported by 179 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H38804_PEA_(—)1_T24 (SEQ ID NO:29)and H38804_PEA_(—)1_T8 (SEQ ID NO:30). Table 225 below describes thestarting and ending position of this segment on each transcript.

TABLE 225 Segment location on transcripts Segment Segment endingTranscript name starting position position H38804_PEA_1_T24 (SEQ ID NO:29) 841 910 H38804_PEA_1_T8 (SEQ ID NO: 30) 841 910

Segment cluster H38804_PEA_(—)1_node_(—)12 (SEQ ID NO:1182) according tothe present invention is supported by 181 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H38804_PEA_(—)1_T24 (SEQ ID NO:29)and H38804_PEA_(—)1_T8 (SEQ ID NO:30). Table 226 below describes thestarting and ending position of this segment on each transcript.

TABLE 226 Segment location on transcripts Segment Segment endingTranscript name starting position position H38804_PEA_1_T24 (SEQ ID NO:29) 911 949 H38804_PEA_1_T8 (SEQ ID NO: 30) 911 949

Segment cluster H38804_PEA_(—)1_node_(—)13 (SEQ ID NO:1183) according tothe present invention is supported by 187 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H38804_PEA_(—)1_T24 (SEQ ID NO:29)and H38804_PEA_(—)1_T8 (SEQ ID NO:30). Table 227 below describes thestarting and ending position of this segment on each transcript.

TABLE 227 Segment location on transcripts Segment Segment endingTranscript name starting position position H38804_PEA_1_T24 (SEQ ID NO:29) 950 1028 H38804_PEA_1_T8 (SEQ ID NO: 30) 950 1028

Segment cluster H38804_PEA_(—)1_node_(—)14 (SEQ ID NO:1184) according tothe present invention is supported by 179 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H38804_PEA_(—)1_T24 (SEQ ID NO:29)and H38804_PEA_(—)1_T8 (SEQ ID NO:30). Table 228 below describes thestarting and ending position of this segment on each transcript.

TABLE 228 Segment location on transcripts Segment Segment endingTranscript name starting position position H38804_PEA_1_T24 (SEQ ID NO:29) 1029 1062 H38804_PEA_1_T8 (SEQ ID NO: 30) 1029 1062

Segment cluster H38804_PEA_(—)1_node_(—)2 (SEQ ID NO:1185) according tothe present invention is supported by 156 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H38804_PEA_(—)1_T24 (SEQ ID NO:29)and H38804_PEA_(—)1_T8 (SEQ ID NO:30). Table 229 below describes thestarting and ending position of this segment on each transcript.

TABLE 229 Segment location on transcripts Segment Segment endingTranscript name starting position position H38804_PEA_1_T24 (SEQ ID NO:29) 646 678 H38804_PEA_1_T8 (SEQ ID NO: 30) 646 678

Segment cluster H38804_PEA_(—)1_node_(—)20 (SEQ ID NO:1186) according tothe present invention is supported by 162 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H38804_PEA_(—)1_T24 (SEQ ID NO:29)and H38804_PEA_(—)1_T8 (SEQ ID NO:30). Table 230 below describes thestarting and ending position of this segment on each transcript.

TABLE 230 Segment location on transcripts Segment Segment endingTranscript name starting position position H38804_PEA_1_T24 (SEQ ID NO:29) 1361 1399 H38804_PEA_1_T8 (SEQ ID NO: 30) 1361 1399

Segment cluster H38804_PEA_(—)1_node_(—)23 (SEQ ID NO:1187) according tothe present invention can be found in the following transcript(s):H38804_PEA_(—)1_T24 (SEQ ID NO:29) and H38804_PEA_(—)1_T8 (SEQ IDNO:30). Table 231 below describes the starting and ending position ofthis segment on each transcript.

TABLE 231 Segment location on transcripts Segment Segment endingTranscript name starting position position H38804_PEA_1_T24 (SEQ ID NO:29) 1400 1420 H38804_PEA_1_T8 (SEQ ID NO: 30) 1400 1420

Segment cluster H38804_PEA_(—)1_node_(—)26 (SEQ ID NO:1188) according tothe present invention is supported by 21 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H38804PEA_(—)1_T8 (SEQ ID NO:30).Table 232 below describes the stargin and ending position of thissegment on each transcript.

TABLE 232 Segment location on transcripts Segment Segment endingTranscript name starting position position H38804_PEA_1_T8 (SEQ ID NO:30) 1970 2017

Segment cluster H38804_PEA_(—)1_node_(—)3 (SEQ ID NO:1189) according tothe present invention is supported by 162 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H38804_PEA_(—)1_T24 (SEQ ID NO:29)and H38804_PEA_(—)1_T8 (SEQ ID NO:30). Table 233 below describes thestarting and ending position of this segment on each transcript.

TABLE 233 Segment location on transcripts Segment Segment endingTranscript name starting position position H38804_PEA_1_T24 (SEQ ID NO:29) 679 716 H38804_PEA_1_T8 (SEQ ID NO: 30) 679 716

Segment cluster H38804_PEA_(—)1_node_(—)4 (SEQ ID NO:1190) according tothe present invention is supported by 172 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H38804_PEA_(—)1_T24 (SEQ ID NO:29)and H38804_PEA_(—)1_T8 (SEQ ID NO:30). Table 234 below describes thestarting and ending position of this segment on each transcript.

TABLE 234 Segment location on transcripts Segment Segment endingTranscript name starting position position H38804_PEA_1_T24 (SEQ ID NO:29) 717 827 H38804_PEA_1_T8 (SEQ ID NO: 30) 717 827

Segment cluster H38804_PEA_(—)1_node_(—)5 (SEQ ID NO:1191) according tothe present invention can be found in the following transcript(s):H38804_PEA_(—)1_T24 (SEQ ID NO:29) and H38804_PEA_(—)1_T8 (SEQ IDNO:30). Table 235 below describes the starting and ending position ofthis segment on each transcript.

TABLE 235 Segment location on transcripts Segment Segment endingTranscript name starting position position H38804_PEA_1_T24 (SEQ ID NO:29) 828 840 H38804_PEA_1_T8 (SEQ ID NO: 30) 828 840

Variant protein alignment to the previously known protein:

Sequence name: /tmp/RR4oV8zYLg/QlORqeqpIp:BUB3_HUMAN (SEQ ID NO: 1424)Sequence documentation: Alignment of: H38804_PEA_1_P5 (SEQ ID NO: 1307)× BUB3_HUMAN (SEQ ID NO: 1424) Alignment segment 1/1: Quality: 3244.00Escore: 0 Matching length: 324 Total length: 324 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:

Sequence name: /tmp/Db0dQEpSuo/Lr8HPXaeBg:BUB3_HUMAN (SEQ ID NO: 1424)Sequence documentation: Alignment of: H38804_PEA_1_P17 (SEQ ID NO: 1308)× BUB3_HUMAN (SEQ ID NO: 1424) . . . Alignment segment 1/1: Quality:3288.00 Escore: 0 Matching length: 328 Total length: 328 MatchingPercent Similarity: 100.00 Matching Percent Identity: 100.00 TotalPercent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0Alignment:

Description for Cluster HSENA78

Cluster HSENA78 features 1 transcript(s) and 7 segment(s) of interest,the names for which are given in Tables 236 and 237, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 238.

TABLE 236 Transcripts of interest Transcript Name Sequence ID No.HSENA78_T5 31

TABLE 237 Segments of interest Segment Name Sequence ID No.HSENA78_node_0 398 HSENA78_node_2 399 HSENA78_node_6 400 HSENA78_node_9401 HSENA78_node_3 402 HSENA78_node_4 403 HSENA78_node_8 404

TABLE 238 Proteins of interest Protein Name Sequence ID No. HSENA78_P21309

These sequences are variants of the known protein Small induciblecytokine B5 precursor (SwissProt accession identifier SZ05_HUMAN; knownalso according to the synonyms CXCL5; Epithelial-derived neutrophilactivating protein 78; Neutrophil-activating peptide ENA-78), SEQ ID NO:1425, referred to herein as the previously known protein.

Protein Small inducible cytokine B5 precursor (SEQ ID NO:1425) is knownor believed to have the following function(s): Involved in neutrophilactivation. The sequence for protein Small inducible cytokine B5precursor is given at the end of the application, as “Small induciblecytokine B5 precursor amino acid sequence”. Protein Small induciblecytokine B5 precursor localization is believed to be Secreted.

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: chemotaxis; signal transduction;cell-cell signaling; positive control of cell proliferation, which areannotation(s) related to Biological Process; and chemokine, which areannotation(s) related to Molecular Function.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

Cluster HSENA78 can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 24 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 24 and Table 239. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions:epithelial malignant tumors and lung malignant tumors.

TABLE 239 Normal tissue distribution Name of Tissue Number colon 0epithelial 2 general 38 kidney 0 lung 3 breast 8 skin 0 stomach 36uterus 4

TABLE 240 P values and ratios for expression in cancerous tissue Name ofTissue P1 P2 SP1 R3 SP2 R4 colon 2.6e−01 3.3e−01 1.7e−01 2.7 2.7e−01 2.2epithelial 2.5e−01 9.0e−02 3.2e−03 4.1 8.5e−07 5.5 general 8.4e−017.2e−01 1 0.3 1 0.4 kidney 1 7.2e−01 1 1.0 1.7e−01 1.9 lung 8.5e−014.8e−01 4.1e−01 1.9 4.0e−05 3.8 breast 9.5e−01 8.7e−01 1 0.8 6.8e−01 1.2skin 2.9e−01 4.7e−01 1.4e−01 7.0 6.4e−01 1.6 stomach 5.0e−01 4.3e−017.5e−01 1.0 4.3e−01 1.3 uterus 7.1e−01 8.5e−01 6.6e−01 1.3 8.0e−01 1.0

As noted above, cluster HSENA78 features 1 transcnpt(s), which werelisted in Table 236 above. These transcript(s) encode for protein(s)which are variant(s) of protein Small inducible cytokine B5 precursor(SEQ ID NO:1425). A description of each variant protein according to thepresent invention is now provided.

Variant protein HSENA78_P2 (SEQ ID NO:1309) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) HSENA78_T5 (SEQ ID NO:31).An alignment is given to the known protein (Small inducible cytokine B5precursor (SEQ ID NO:1425)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between HSENA78_P2 (SEQ ID NO:1309) and SZ05_HUMAN(SEQ ID NO:1425):

1. An isolated chimeric polypeptide encoding for HSENA78_P2 (SEQ IDNO:1309), comprising a first amino acid sequence being at least 90%homologous toMSLLSSRAARVPGPSSSLCALLVLLLLLTQPGPIASAGPAAAVLRELRCVCLQTTQGVHPKMISNLQVFAIGPQCSKVEVV corresponding to amino acids 1-81 of SZ05_HUMAN (SEQ IDNO:1425), which also corresponds to amino acids 1-81 of HSENA78_P2 (SEQID NO:1309).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein HSENA78_P2 (SEQ ID NO:1309) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table241, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein HSENA78_P2 (SEQ ID NO:1309) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 241 Amino acid mutations SNP position(s) on amino acid sequenceAlternative amino acid(s) Previously known SNP? 80 V -> No 81 V -> No

Variant protein HSENA78_P2 (SEQ ID NO:1309) is encoded by the followingtranscript(s): HSENA78_T5 (SEQ ID NO:31), for which the sequence(s)is/are given at the end of the application. The coding portion oftranscript HSENA78_T5 (SEQ ID NO:31) is shown in bold; this codingportion starts at position 149 and ends at position 391. The transcriptalso has the following SNPs as listed in Table 242 (given according totheir position on the nucleotide sequence, with the alternative nucleicacid listed; the last column indicates whether the SNP is known or not;the presence of known SNPs in variant protein HSENA78_P2 (SEQ IDNO:1309) sequence provides support for the deduced sequence of thisvariant protein according to the present invention).

TABLE 242 Nucleic acid SNPs SNP position on nucleotide sequenceAlternative nucleic acid Previously known SNP? 92 C -> T Yes 144 C -> TNo 1151 A -> T Yes 1389 T -> C No 1867 C -> G Yes 145 C -> T No 181 C ->T Yes 316 G -> A Yes 388 G -> No 390 T -> No 605 T -> No 972 C -> T Yes1105 A -> G Yes

As noted above, cluster HSENA78 features 7 segment(s), which were listedin Table 237 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster HSENA78_node_(—)0 (SEQ ID NO:1192) according to thepresent invention is supported by 24 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HSENA78_T5 (SEQ ID NO:31). Table 243 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 243 Segment location on transcripts Segment Segment startingending Transcript name position position HSENA78_T5 (SEQ ID NO: 31) 1257

Segment cluster HSENA78_node_(—)2 (SEQ ID NO:1193) according to thepresent invention is supported by 22 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HSENA78_T5 (SEQ ID NO:31). Table 244 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 244 Segment location on transcripts Segment Segment startingending Transcript name position position HSENA78_T5 (SEQ ID NO: 31) 258390

Segment cluster HSENA78_node_(—)6 (SEQ ID NO:1194) according to thepresent invention is supported by 68 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HSENA78_T5 (SEQ ID NO:31). Table 245 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 245 Segment location on transcripts Segment Segment startingending Transcript name position position HSENA78_T5 (SEQ ID NO: 31) 5852370

Segment cluster HSENA78_node_(—)9 (SEQ ID NO:1195) according to thepresent invention is supported by 28 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HSENA78_T5 (SEQ ID NO:31). Table 246 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 246 Segment location on transcripts Segment Segment startingending Transcript name position position HSENA78_T5 (SEQ ID NO: 31) 23942546

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster HSENA78_node_(—)3 (SEQ ID NO:1196) according to thepresent invention is supported by 1 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HSENA78_T5 (SEQ ID NO:31). Table 247 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 247 Segment location on transcripts Segment Segment startingending Transcript name position position HSENA78_T5 (SEQ ID NO: 31) 391500

Segment cluster HSENA78_node_(—)4 (SEQ ID NO:1197) according to thepresent invention is supported by 17 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HSENA78_T5 (SEQ ID NO:31). Table 248 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 248 Segment location on transcripts Segment Segment startingending Transcript name position position HSENA78_T5 (SEQ ID NO: 31) 501584

Segment cluster HSENA78_node_(—)8 (SEQ ID NO:1198) according to thepresent invention can be found in the following transcript(s):HSENA78_T5 (SEQ ID NO:31). Table 249 below describes the starting andending position of this segment on each transcript.

TABLE 249 Segment location on transcripts Segment Segment startingending Transcript name position position HSENA78_T5 (SEQ ID NO: 31) 23712393

Variant protein alignment to the previously known protein:

Sequence name: /tmp/5kiQY6MxWx/pLnTrxsCqk:SZ05_HUMAN (SEQ ID NO: 1425)Sequence documentation: Alignment of: HSENA78_P2 (SEQ ID NO: 1309) ×SZ05_HUMAN (SEQ ID NO: 1425) Alignment segment 1/1: Quality: 767.00Escore: 0 Matching length: 81 Total length: 81 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:

Description for Cluster HUMODCA

Cluster HUMODCA features 1 transcript(s) and 17 segment(s) of interest,the names for which are given in Tables 250 and 251, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 252.

TABLE 250 Transcripts of interest Transcript name Sequence ID No.HUMODCA_T17 32

TABLE 251 Segments of interest Segment Name Sequence ID No.HUMODCA_node_1 405 HUMODCA_node_25 406 HUMODCA_node_32 407HUMODCA_node_36 408 HUMODCA_node_39 409 HUMODCA_node_41 410HUMODCA_node_0 411 HUMODCA_node_10 412 HUMODCA_node_12 413HUMODCA_node_13 414 HUMODCA_node_2 415 HUMODCA_node_27 416HUMODCA_node_3 417 HUMODCA_node_30 418 HUMODCA_node_34 419HUMODCA_node_38 420 HUMODCA_node_40 421

TABLE 252 Proteins of interest Protein Name Sequence ID No. HUMODCA_P91310

These sequences are variants of the known protein Ornithinedecarboxylase (SwissProt accession identifier DCOR_HUMAN; known alsoaccording to the synonyms EC 4.1.1.17; ODC), SEQ ID NO: 1426, referredto herein as the previously known protein.

Protein Ornithine decarboxylase (SEQ ID NO:1426) is known or believed tohave the following function(s): Polyamine biosynthesis; first(rate-limiting) step. The sequence for protein Ornithine decarboxylase(SEQ ID NO:1426) is given at the end of the application, as “Ornithinedecarboxylase (SEQ ID NO:1426) amino acid sequence”. Known polymorphismsfor this sequence are as shown in Table 253.

TABLE 253 Amino acid mutations for Known Protein SNP position(s) onamino acid sequence Comment 415 Q −> E

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: polyamine biosynthesis, whichare annotation(s) related to Biological Process; and ornithinedecarboxylase; lyase, which are annotation(s) related to MolecularFunction.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

Cluster HUMODCA can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 25 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 25 and Table 254. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions:brain malignant tumors, colorectal cancer, epithelial malignant tumorsand a mixture of malignant tumors from different tissues.

TABLE 254 Normal tissue distribution Name of Tissue Number adrenal 120bladder 82 bone 161 brain 53 colon 0 epithelial 107 general 94 head andneck 10 kidney 114 liver 107 lung 120 lymph nodes 165 breast 61 bonemarrow 156 muscle 55 ovary 36 pancreas 102 prostate 140 skin 188 stomach109 T cells 278 Thyroid 128 uterus 118

TABLE 255 P values and ratios for expression in cancerous tissue Name ofTissue P1 P2 SP1 R3 SP2 R4 adrenal 8.3e−01 7.8e−01 1 0.2 8.5e−01 0.7bladder 5.4e−01 5.1e−01 6.2e−01 1.1 5.0e−01 1.1 bone 8.3e−01 3.2e−01 10.2 8.4e−01 0.7 brain 2.6e−01 3.8e−02 6.5e−04 2.8 8.7e−10 3.6 colon2.2e−02 5.8e−03 1.5e−03 6.9 6.7e−05 9.9 epithelial 6.4e−02 2.7e−031.4e−03 1.5 1.6e−12 2.1 general 1.3e−03 5.4e−08 1.9e−08 1.7 1.4e−39 2.6head and neck 1.7e−01 1.7e−01 1 1.2 7.5e−01 1.3 kidney 7.7e−01 7.6e−017.1e−01 0.8 6.6e−01 0.9 liver 7.3e−01 5.7e−01 1 0.3 2.4e−01 1.2 lung7.8e−01 5.8e−01 7.6e−01 0.6 7.3e−04 1.7 lymph nodes 3.9e−01 2.5e−011.8e−01 1.1 1.4e−04 2.1 breast 7.8e−01 4.7e−01 7.7e−01 0.8 6.4e−01 1.0bone marrow 3.4e−01 2.6e−01 2.8e−01 2.1 1.6e−01 1.2 muscle 8.5e−016.1e−01 1 0.2 7.1e−05 1.0 ovary 1.7e−01 9.3e−02 3.8e−01 1.7 2.2e−02 2.6pancreas 2.2e−01 3.2e−01 5.7e−02 1.6 6.6e−03 1.5 prostate 5.0e−014.9e−01 3.8e−02 1.9 4.5e−02 1.7 skin 6.2e−01 5.8e−01 5.4e−02 0.9 1.5e−020.5 stomach 4.2e−01 2.6e−01 3.7e−01 0.7 7.3e−03 2.3 T cells 1 1 5.5e−011.5 8.1e−01 0.9 Thyroid 8.3e−02 8.3e−02 5.9e−01 1.3 5.9e−01 1.3 uterus4.2e−01 2.4e−01 1.6e−01 1.2 4.9e−02 1.7

As noted above, cluster HUMODCA features 1 transcript(s), which werelisted in Table 250 above. These transcript(s) encode for protein(s)which are variant(s) of protein Ornithine decarboxylase (SEQ IDNO:1426). A description of each variant protein according to the presentinvention is now provided.

Variant protein HUMODCA_P9 (SEQ ID NO:1310) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcripts) HUMODCA_T17 (SEQ ID NO:32).An alignment is given to the known protein (Ornithine decarboxylase (SEQID NO:1426)) at the end of the application. One or more alignments toone or more previously published protein sequences are given at the endof the application. A brief description of the relationship of thevariant protein according to the present invention to each such alignedprotein is as follows:

Comparison report between HUMODCA_P9 (SEQ ID NO:1310) and DCOR HUMAN(SEQ ID NO:1426):

1. An isolated chimeric polypeptide encoding for HUMODCA_P9 (SEQ IDNO:1310), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence MKSLTATSSMKVLLPRTFWTRKLMKFLLL (SEQ ID NO: 1768)corresponding to amino acids 1-29 of HUMODCA_P9 (SEQ ID NO:1310), and asecond amino acid sequence being at least 90% homologous toLVLRIATDDSKAVCRLSVKFGATLRTSRLLLERAKELNIDVVGVSFHVGSGCTDPETFVQAISDARCVFDMGAEVGFSMYLLDIGGGFPGSEDVKLKFEEITGVINPALDKYFPSDSGVRIIAEPGRYYVASAFTLAVNIIAKKIVLKEQTGSDDEDESSEQTFMYYVNDGVYGSFNCILYDHAHVLPLLQKRPKPDEKYYSSSIWGPTCDGLDRIVERCDLPEMHVGDWMLFENMGAYTVAAASTFNGFQRPTIYYVMSGPAWQLMQQFQNPDFPPEVEEQDASTLPVSCAWESGMKRHRAACASASINV corresponding to amino acids 151-461 ofDCOR_HUMAN (SEQ ID NO:1426), which also corresponds to amino acids30-340 of HUMODCA_P9 (SEQ ID NO:1310), wherein said first and secondamino acid sequences are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a head of HUMODCA_P9 (SEQ IDNO:1310), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence MKSLTATSSMKVLLPRTFWTRKLMKFLLL (SEQ ID NO: 1768) of HUMODCA_P9(SEQ ID NO:1310).

Comparison report between HUMODCA_P9 (SEQ ID NO:1310) and AAA59968 (SEQID NO:1702):

1. An isolated chimeric polypeptide encoding for HUMODCA_P9 (SEQ IDNO:1310), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence MKSLTATSSMKVLLPRTFWTRKLMKFLLL (SEQ ID NO: 1768)corresponding to amino acids 1-29 of HUMODCA_P9 (SEQ ID NO:1310), and asecond amino acid sequence being at least 90% homologous toLVLRIATDDSKAVCRLSVKFGATLRTSRLLLERAKELNIDVVGVSFHVGSGCTDPETFVQAISDARCVFDMGAEVGFSMYLLDIGGGFPGSEDVKLKFEEITGVINPALDKYFPSDSGVRIIAEPGRYYVASAFTLAVNIIAKKIVLKEQTGSDDEDESSEQTFMYYVNDGVYGSFNCILYDHAHVKPLLQKRPKPDEKYYSSSIWGPTCDGLDRIVERCDLPEMHVGDWMLFENMGAYTVAAASTFNGFQRPTIYYVMSGPAWQLMQQFQNPDFPPEVEEQDASTLPVSCAWESGMKRHRAACASASINV corresponding to amino acids 40-350 ofAAA59968, which also corresponds to amino acids 30-340 of HUMODCA_P9(SEQ ID NO:1310), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

2. An isolated polypeptide encoding for a head of HUMODCA_P9 (SEQ IDNO:1310), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence MKSLTATSSMKVLLPRTFWTRKLMKFLLL (SEQ ID NO: 1768) of HUMODCA_P9(SEQ ID NO:1310).

Comparison report between HUMODCA_P9 (SEQ ID NO:1310) and AAH14562 (SEQID NO:1703):

1. An isolated chimeric polypeptide encoding for HUMODCA_P9 (SEQ IDNO:1310), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence MKSLTATSSMKVLLPRTFWTRKLMKFLLL (SEQ ID NO: 1768)corresponding to amino acids 1-29 of HUMODCA_P9 (SEQ ID NO:1310), and asecond amino acid sequence being at least 90% homologous toLVLRIATDDSKAVCRLSVKFGATLRTSRLLLERAKELNIDVVGVSFHVGSGCTDPETFVQAISDARCVFDMGAEVGFSMYLLDIGGGFPGSEDVKLKFEEITGVINPALDKYFPSDSGVRIIAEPGRYYVASAFTLAVNIIAKKIVLKEQTGSDDEDESSEQTFMYYVNDGVYGSFNCILYDHAHVKPLLQKRPKPDEKYYSSSIWGPTCDGLDRIVERCDLPEMHVGDWMLFENMGAYTVAAASTFNGFQRPTIYYVMSGPAWQLMQQFQNPDFPPEVEEQDASTLPVSCAWESGMKRHRAACASASINV corresponding to amino acids 86-396 ofAAH14562 (SEQ ID NO:1703), which also corresponds to amino acids 30-340of HUMODCA_P9 (SEQ ID NO:1310), wherein said first and second amino acidsequences are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a head of HUMODCA_P9 (SEQ IDNO:1310), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence MKSLTATSSMKVLLPRTFWTRKLMKFLLL (SEQ ID NO: 1768) of HUMODCA_P9(SEQ ID NO:1310).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein HUMODCA_P9 (SEQ ID NO:1310) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table256, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein HUMODCA_P9 (SEQ ID NO:1310) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 256 Amino acid mutations SNP position(s) on Alternative Previouslyamino acid sequence amino acid(s) known SNP? 150 I -> S No 150 I -> V No262 F -> L No 263 E -> No 263 E -> G No 30 L -> No 301 N -> No 301 N ->K No 309 E -> K No 312 D -> N No 323 E -> K No 329 H -> P No 174 I -> No34 I -> No 59 L -> No 70 V -> No 86 T -> No 86 T -> N No 90 A -> No 94 A-> No 97 V -> No 97 V -> G No 198 N -> D No 200 G -> No 3 S -> No 207 C-> G No 207 C -> R No 223 P -> No 262 F -> No

Variant protein HUMODCA_P9 (SEQ ID NO:1310) is encoded by the followingtranscript(s): HUMODCA_T17 (SEQ ID NO:32), for which the sequence(s)is/are given at the end of the application. The coding portion oftranscript HUMODCA_T17 (SEQ ID NO:32) is shown in bold; this codingportion starts at position 528 and ends at position 1547. The transcriptalso has the following SNPs as listed in Table 257 (given according totheir position on the nucleotide sequence, with the alternative nucleicacid listed; the last column indicates whether the SNP is known or not;the presence of known SNPs in variant protein HUMODCA_P9 (SEQ IDNO:1310) sequence provides support for the deduced sequence of thisvariant protein according to the present invention).

TABLE 257 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 28 C -> G Yes 210 C -> No536 T -> No 615 T -> No 628 T -> No 703 T -> No 736 T -> No 784 C -> No784 C -> A No 797 A -> No 797 A -> T No 808 C -> No 217 C -> No 817 T ->No 817 T -> G No 869 C -> T Yes 975 A -> G No 976 T -> G No 1048 T -> No1119 A -> G No 1127 C -> No 1127 C -> G No 1146 T -> C No 366 G -> C No1146 T -> G No 1194 C -> No 1283 T -> C Yes 1311 T -> No 1311 T -> C No1315 A -> No 1315 A -> G No 1430 C -> No 1430 C -> A No 1433 C -> G No366 G -> T No 1433 C -> T Yes 1452 G -> A No 1461 G -> A No 1494 G -> ANo 1513 A -> C No 1632 T -> No 1673 C -> No 1739 T -> No 1739 T -> G No1742 T -> C No 447 G -> A Yes 1786 C -> No 1786 C -> G No 1832 T -> CYes 1877 C -> T No 464 T -> G Yes 473 A -> G Yes 506 G -> A Yes 521 T ->No

As noted above, cluster HUMODCA features 17 segment(s), which werelisted in Table 251 above and for which the sequence(s) are given at theend of the application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster HUMODCA_node_(—)1 (SEQ ID NO:1199) according to thepresent invention is supported by 76 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMODCA_T17 (SEQ ID NO:32). Table 258 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 258 Segment location on transcripts Segment Segment startingending Transcript name position position HUMODCA_T17 (SEQ ID NO: 32) 118256

Segment cluster HUMODCA_node_(—)25 (SEQ ID NO:1200) according to thepresent invention is supported by 190 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMODCA_T17 (SEQ ID NO:32). Table 259 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 259 Segment location on transcripts Segment Segment startingending Transcript name position position HUMODCA_T17 (SEQ ID NO: 32) 614748

Segment cluster HUMODCA_node_(—)32 (SEQ ID NO:1201) according to thepresent invention is supported by 249 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMODCA_T17 (SEQ ID NO:32). Table 260 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 260 Segment location on transcripts Segment Segment startingending Transcript name position position HUMODCA_T17 (SEQ ID NO: 32) 9151077

Segment cluster HUMODCA_node_(—)36 (SEQ ID NO:1202) according to thepresent invention is supported by 348 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMODCA_T17 (SEQ ID NO:32). Table 261 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 261 Segment location on transcripts Segment Segment startingending Transcript name position position HUMODCA_T17 (SEQ ID NO: 32)1191 1405

Segment cluster HUMODCA_node_(—)39 (SEQ ID NO:1203) according to thepresent invention is supported by 297 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMODCA_T17 (SEQ ID NO:32). Table 262 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 262 Segment location on transcripts Segment Segment startingending Transcript name position position HUMODCA_T17 (SEQ ID NO: 32)1461 1633

Segment cluster HUMODCA_node_(—)41 (SEQ ID NO:1204) according to thepresent invention is supported by 230 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMODCA_T17 (SEQ ID NO:32). Table 263 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 263 Segment location on transcripts Segment Segment startingending Transcript name position position HUMODCA_T17 (SEQ ID NO: 32)1728 1893

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster HUMODCA_node_(—)0 (SEQ ID NO:1205) according to thepresent invention is supported by 9 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMODCA_T17 (SEQ ID NO:32). Table 264 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 264 Segment location on transcripts Segment Segment startingending Transcript name position position HUMODCA_T17 (SEQ ID NO: 32) 1117

Segment cluster HUMODCA_node_(—)10 (SEQ ID NO:1206) according to thepresent invention is support by 107 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMODCA_T17 (SEQ ID NO:32). Table 265 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 265 Segment location on transcripts Segment Segment startingending Transcript name position position HUMODCA_T17 (SEQ ID NO: 32) 385494

Segment cluster HUMODCA_node_(—)12 (SEQ ID NO:1207) according to thepresent invention is supported by 132 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMODCA_T17 (SEQ ID NO:32). Table 266 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 266 Segment location on transcripts Segment Segment startingending Transcript name position position HUMODCA_T17 (SEQ ID NO: 32) 495586

Segment cluster HUMODCA_node_(—)13 (SEQ ID NO:1208) according to thepresent invention is supported by 126 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMODCA_T17 (SEQ ID NO:32). Table 267 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 267 Segment location on transcripts Segment Segment startingending Transcript name position position HUMODCA_T17 (SEQ ID NO: 32) 587613

Segment cluster HUMODCA_node_(—)2 (SEQ ID NO:1209) according to thepresent invention is supported by 81 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMODCA_T17 (SEQ ID NO:32). Table 268 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 268 Segment location on transcripts Segment Segment startingending Transcript name position position HUMODCA_T17 (SEQ ID NO: 32) 257328

Segment cluster HUMODCA_node_(—)27 (SEQ ID NO:1210) according to thepresent invention is supported by 185 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMODCA_T17 (SEQ ID NO:32). Table 269 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 269 Segment location on transcripts Segment Segment startingending Transcript name position position HUMODCA_T17 (SEQ ID NO: 32) 749830

Segment cluster HUMODCA_node_(—)3 (SEQ ID NO:1211) according to thepresent invention is supported by 85 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMODCA_T17 (SEQ ID NO:32). Table 270 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 270 Segment location on transcripts Segment Segment startingending Transcript name position position HUMODCA_T17 (SEQ ID NO: 32) 329384

Segment cluster HUMODCA_node_(—)30 (SEQ ID NO:1212) according to thepresent invention is supported by 196 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMODCA T17 (SEQ ID NO:32). Table 271 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 271 Segment location on transcripts Segment Segment startingending Transcript name position position HUMODCA_T17 (SEQ ID NO: 32) 831914

Segment cluster HUMODCA_node_(—)34 (SEQ ID NO:1213) according to thepresent invention is supported by 259 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMODCA_T17 (SEQ ID NO:32). Table 272 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 272 Segment location on transcripts Segment Segment startingending Transcript name position position HUMODCA_T17 (SEQ ID NO: 32)1078 1190

Segment cluster HUMODCA_node_(—)38 (SEQ ID NO:1214) according to thepresent invention is supported by 272 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMODCA_T17 (SEQ ID NO:32). Table 273 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 273 Segment location on transcripts Segment Segment startingending Transcript name position position HUMODCA_T17 (SEQ ID NO: 32)1406 1460

Segment cluster HUMODCA_node_(—)40 (SEQ ID NO:1215) according to thepresent invention is supported by 239 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMODCA_T17 (SEQ ID NO:32). Table 274 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 274 Segment location on transcripts Segment Segment startingending Transcript name position position HUMODCA_T17 (SEQ ID NO: 32)1634 1727

Variant protein alignment to the previously known protein:

Sequence name: /tmp/y03EwE6iOl/dRQ5l2K6e2:DCOR_HUMAN (SEQ ID NO: 1426)Sequence documentation: Alignment of: HUMODCA_P9 (SEQ ID NO: 1310) ×DCOR_HUMAN (SEQ ID NO: 1426) Alignment segment 1/1: Quality: 3056.00Escore: 0 Matching length: 311 Total length: 311 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:

Sequence name: /tmp/y03EwE6i01/dRQ5l2K6e2:AAA59968 Sequencedocumentation: Alignment of: HUMODCA_P9 (SEQ ID NO: 1310) × AAA59968 . .. Alignment segment 1/1: Quality: 3056.00 Escore: 0 Matching length: 311Total length: 311 Matching Percent Similarity: 100.00 Matching PercentIdentity: 100.00 Total Percent Similarity: 100.00 Total PercentIdentity: 100.00 Gaps: 0 Alignment:

Sequence name: /tmp/y03EwE6i01/dRQ5l2K6e2:AAH14562 (SEQ ID NO: 1703)Sequence documentation: Alignment of: HUMODCA_P9 (SEQ ID NO: 1310) ×AAH14562 (SEQ ID NO: 1703) . . . Alignment segment 1/1: Quality: 3056.00Escore: 0 Matching length: 311 Total length: 311 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:

Description for Cluster R00299

Cluster R00299 features 1 transcript(s) and 12 segment(s) of interest,the names for which are given in

Tables 275 and 276, respectively, the sequences themselves are given atthe end of the application. The selected protein variants are given intable 277.

TABLE 275 Transcripts of interest Transcript name Sequence ID No.R00299_T2 33

TABLE 276 Segments of interest Segment Name Sequence ID No.R00299_node_2 422 R00299_node_30 423 R00299_node_10 424 R00299_node_14425 R00299_node_15 426 R00299_node_20 427 R00299_node_23 428R00299_node_25 429 R00299_node_28 430 R00299_node_31 431 R00299_node_5432 R00299_node_9 433

TABLE 277 Proteins of interest Protein Name Sequence ID No. R00299_P31311

These sequences are variants of the known protein Tescalcin (SwissProtaccession identifier TESC_HUMAN; known also according to the synonymsTSC), SEQ ID NO: 1427, referred to herein as the previously knownprotein.

Protein Tescalcin (SEQ ID NO:1427) is known or believed to have thefollowing function(s): Binds calcium. The sequence for protein Tescalcinis given at the end of the application, as “Tescalcin amino acidsequence”.

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: calcium binding, which areannotation(s) related to Molecular Function.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nim dot nih dotgov/projects/LocusLink/>.

Cluster R00299 can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 26 below refer toweighted expression of ESTs in each category, as “parts per million”(ratio of the expression of ESTs for a particular cluster to theexpression of all ESTs in that category, according to parts permillion).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 26 and Table 278. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions: lungmalignant tumors.

TABLE 278 Normal tissue distribution Name of Tissue Number bone 0 colon0 epithelial 11 general 11 liver 0 lung 10 lymph nodes 22 bone marrow 31ovary 0 pancreas 14 prostate 16 stomach 76 T cells 0 Thyroid 0

TABLE 279 P values and ratios for expression in cancerous tissue Name ofTissue P1 P2 SP1 R3 SP2 R4 bone 1 6.7e−01 1 1.0 7.0e−01 1.4 colon5.0e−02 5.3e−02 2.4e−01 2.8 2.1e−01 2.8 epithelial 7.7e−02 9.5e−024.0e−01 1.3 6.1e−03 1.9 general 2.3e−01 2.6e−01 5.3e−01 1.0 2.6e−04 1.9liver 1 4.5e−01 1 1.0 6.9e−01 1.5 lung 4.9e−01 2.7e−01 6.5e−01 1.75.6e−04 3.8 lymph nodes 8.5e−01 8.7e−01 1 0.5 2.0e−01 1.1 bone marrow8.6e−01 8.5e−01 1 0.5 2.3e−01 1.4 ovary 4.0e−01 4.4e−01 1 1.1 1 1.1pancreas 7.2e−01 6.9e−01 6.7e−01 1.0 3.5e−01 1.5 prostate 8.7e−019.1e−01 6.7e−01 1.0 7.5e−01 0.9 stomach 6.6e−01 7.5e−01 1 0.4 6.7e−010.7 T cells 1 6.7e−01 1 1.0 5.2e−01 1.8 Thyroid 1.8e−01 1.8e−01 6.7e−011.6 6.7e−01 1.6

As noted above, cluster R00299 features 1 transcript(s), which werelisted in Table 275 above. These transcript(s) encode for protein(s)which are variant(s) of protein Tescalcin (SEQ ID NO:1427). Adescription of each variant protein according to the present inventionis now provided.

Variant protein R00299_P3 (SEQ ID NO:1311) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) R00299_T2 (SEQ ID NO:33). Analignment is given to the known protein (Tescalcin (SEQ ID NO:1427)) atthe end of the application. One or more alignments to one or morepreviously published protein sequences are given at the end of theapplication. A brief description of the relationship of the variantprotein according to the present invention to each such aligned proteinis as follows:

Comparison report between R00299_P3 (SEQ ID NO:1311) and Q9NWT9 (SEQ IDNO:1704):

1. An isolated chimeric polypeptide encoding for R00299_P3 (SEQ IDNO:1311), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence MAEKALLCPSSAGLGTWPWVLNSAWPVLPLAVDQGVDWRPRGPV (SEQ IDNO: 1769) corresponding to amino acids 1-44 of R00299_P3 (SEQ IDNO:1311), second amino acid sequence being at least 90% homologous toSSDQIEQLHRRFKQLSGDQPTIRKENFNNVPDLELNPIRSKIVRAFFDNRNLRKGPSGLADEINFEDFLTIMSYFRPIDTTMDEEQVELSRKEKLRFLFHMYDSDSDGRITLEEYRNV corresponding to aminoacids 74-191 of Q9NWT9 (SEQ ID NO:1704), which also corresponds to aminoacids 45-162 of R00299_P3 (SEQ ID , and a third amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequenceVEELLSGNPHIEKESARSIADGAMMEAASVCMGQMEPDQVYEGITFEDFLKIWQGIDIETKMHVRFLNMETMALCH (SEQ ID NO: 1770) corresponding to amino acids 163-238 ofR00299_P3 (SEQ ID NO:1311), wherein said first, second and third aminoacid sequences are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a head of R00299_P3 (SEQ IDNO:1311), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence MAEKALLCPSSAGLGTWPWVLNSAWPVLPLAVDQGVDWRPRGPV (SEQ ID NO: 1769)of R00299_P3 (SEQ ID NO:1311).

3. An isolated polypeptide encoding for a tail of R00299_P3 (SEQ IDNO:1311), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequenceVEELLSGNPHIEKESARSIADGAMMEAASVCMGQMEPDQVYEGITFEDFLKIWQGIDIETKMHVRFLNMETMALCH (SEQ ID NO: 1770) in R00299_p3 (SEQ ID NO:1311).

Comparison report between R00299_P3 (SEQ ID NO:1311) and TESCHUMAN (SEQID NO:1427):

1. An isolated chimeric polypeptide encoding for R00299_P3 (SEQ IDNO:1311), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence MAEKALLCPSSAGLGTWPWVLNSAWPVLPLAVDQGVDWRPRGPV (SEQ IDNO: 1769) corresponding to amino acids 1-44 of R00299_P3 (SEQ IDNO:1311), and a second amino acid sequence being at least 90% homologoustoSSDQIEQLHRRFKQLSGDQPTIRKENFNNVPDLELNPIRSKIVRAFFDNRNLRKGPSGLADEINFEDFLTIMSYFRPIDTTMDEEQVELSRKEKLRFLFHMYDSDSDGRITLEEYRNVVEELLSGNPHIEKESARSIADGAMMEAASVCMGQMEPDQVYEGITFEDFLKIWQGIDIETKMHVRFLNMETMALCH (SEQ ID NO: 1770)corresponding to amino acids 21-214 of TESC_HUMAN (SEQ ID NO:1427),which also corresponds to amino acids 45-238 of R00299_P3 (SEQ IDNO:1311), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

2. An isolated polypeptide encoding for a head of R00299_P3 (SEQ IDNO:1311), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence MAEKALLCPSSAGLGTWPWVLNSAWPVLPLAVDQGVDWRPRGPV (SEQ ID NO: 1769)of R00299_P3 (SEQ ID NO:1311).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseone of the two signal-peptide prediction programs (HMM:Signalpeptide,NN:NO) predicts that this protein has a signal peptide.

Variant protein R00299_P3 (SEQ ID NO:1311) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table280, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein R00299_P3 (SEQ ID NO:1311) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 280 Amino acid mutations SNP position(s) Alternative Previously onamino acid amino known sequence acid(s) SNP? 120 R -> G No 120 R -> W No

Variant protein R00299_P3 (SEQ ID NO:1311) is encoded by the followingtranscript(s): R00299_T2 (SEQ ID NO:33), for which the sequence(s)is/are given at the end of the application. The coding portion oftranscript R00299_T2 (SEQ ID NO:33) is shown in bold; this codingportion starts at position 142 and ends at position 855. The transcriptalso has the following SNPs as listed in Table 281 (given according totheir position on the nucleotide sequence, with the alternative nucleicacid listed; the last column indicates whether the SNP is known or not;the presence of known SNPs in variant protein R00299_P3 (SEQ ID NO:1311)sequence provides support for the deduced sequence of this variantprotein according to the present invention).

TABLE 281 Nucleic acid SNPs SNP position Alternative Previously onnucleotide nucleic known sequence acid SNP? 177 C -> A Yes 499 C -> G No499 C -> T No 900 G -> T Yes 916 G -> No 969 G -> No 969 G -> A No 987 A-> C No

As noted above, cluster K00299 features 12 segment(s), which were listedin Table 276 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster R00299_node_(—)2 (SEQ ID NO:1216) according to thepresent invention is supported by 3 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R00299_T2 (SEQ ID NO:33). Table 282 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 282 Segment location on transcripts Segment Segment startingending Transcript name position position R00299_T2 (SEQ ID NO: 33) 1 271

Segment cluster R00299_node_(—)30 (SEQ ID NO:1217) according to thepresent invention is supported by 75 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R00299_T2 (SEQ ID NO:33). Table 283 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 283 Segment location on transcripts Segment Segment startingending Transcript name position position R00299_T2 (SEQ ID NO: 33) 790961

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription. Segment cluster R00299_node_node_(—)10 (SEQ ID NO:1218)according to the present invention is supported by 46 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s): R00299_T2 (SEQ ID NO:33).Table 284 below describes the starting and ending position of thissegment on each transcript.

TABLE 284 Segment location on transcripts Segment Segment startingending Transcript name position position R00299_T2 (SEQ ID NO: 33) 346422

Segment cluster R00299_node_(—)14 (SEQ ID NO:1219) according to thepresent invention is supported by 61 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R00299_T2 (SEQ ID NO:33). Table 285 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 285 Segment location on transcripts Segment Segment startingending Transcript name position position R00299_T2 (SEQ ID NO: 33) 423537

Segment cluster R00299_node_(—)15 (SEQ ID NO:1220) according to thepresent invention can be found in the following transcript(s): R00299_T2(SEQ ID NO:33). Table 286 below describes the starting and endingposition of this segment on each transcript.

TABLE 286 Segment location on transcripts Segment Segment startingending Transcript name position position R00299_T2 (SEQ ID NO: 33) 538562

Segment cluster R00299_node_(—)20 (SEQ ID NO:1221) according to thepresent invention is supported by 66 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R00299_T2 (SEQ ID NO:33) Table 287 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 287 Segment location on transcripts Segment Segment startingending Transcript name position position R00299_T2 (SEQ ID NO: 33) 563624

Segment cluster R00299_node_(—)23 (SEQ ID NO:1222) according to thepresent invention is supported by 71 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R00299_T2 (SEQ ID NO:33). Table 288 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 288 Segment location on transcripts Segment Segment startingending Transcript name position position R00299_T2 (SEQ ID NO: 33) 625732

Segment cluster R00299_node_(—)25 (SEQ ID NO:1223) according to thepresent invention is supported by 62 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R00299_T2 (SEQ ID NO:33). Table 289 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 289 Segment location on transcripts Segment Segment startingending Transcript name position position R00299_T2 (SEQ ID NO: 33) 733780

Segment cluster R00299_node_(—)28 (SEQ ID NO:1224) according to thepresent invention can be found in the following transcript(s): R00299_T2(SEQ ID NO:33). Table 290 below describes the starting and endingposition of this segment on each transcript.

TABLE 290 Segment location on transcripts Segment Segment startingending Transcript name position position R00299_T2 (SEQ ID NO: 33) 781789

Segment cluster R00299_node_(—)31 (SEQ ID NO:1225) according to thepresent invention is supported by 48 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R00299_T2 (SEQ ID NO:33). Table 291 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 291 Segment location on transcripts Segment Segment startingending Transcript name position position R00299_T2 (SEQ ID NO: 33) 9621069

Segment cluster R00299_node_(—)5 (SEQ ID NO:1226) according to thepresent invention is supported by 45 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R00299_T2 (SEQ ID NO:33). Table 292 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 292 Segment location on transcripts Segment Segment startingending Transcript name position position R00299_T2 (SEQ ID NO: 33) 272341

Segment cluster R00299_node_(—)9 (SEQ ID NO:1227) according to thepresent invention can be found in the following transcript(s): R00299_T2(SEQ ID NO:33). Table 293 below describes the starting and endingposition of this segment on each transcript.

TABLE 293 Segment location on transcripts Segment Segment startingending Transcript name position position R00299_T2 (SEQ ID NO: 33) 342345

Microarray (chip) data is also available for this gene as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotide was found to hit this segment (with regard to lungcancer), shown in Table 294.

TABLE 294 Oligonucleotide related to this gene Overexpressed ChipOligonucleotide name in cancers reference R00299_0_8_0 (SEQ ID NO: 217)lung cancer Lung

Variant protein alignment to the previously known protein:

Sequence name: /tmp/OleVDhrKQ0/EjblgLomjM:Q9NWT9 (SEQ ID NO: 1704)Sequence documentation: Alignment of: R00299_P3 (SEQ ID NO: 1311) ×Q9NWT9 (SEQ ID NO: 1704) . . . Alignment segment 1/1: Quality: 1162.00Escore: 0 Matching length: 118 Total length: 118 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:

Sequence name: /tmp/OleVDhrKQ0/EjblgLomjM:TESC_HUMAN (SEQ ID NO: 1427)Sequence documentation: Alignment of: R00299_P3 (SEQ ID NO: 1311) ×TESC_HUMAN (SEQ ID NO: 1427) . . . Alignment segment 1/1: Quality:1920.00 Escore: 0 Matching length: 194 Total length: 194 MatchingPercent Similarity: 100.00 Matching Percent Identity: 100.00 TotalPercent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0Alignment:

Description for Cluster W60282

Cluster W60282 features 1 transcript(s) and 6 segment(s) of interest,the names for which are given in Tables 295 and 296, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 297.

TABLE 295 Transcripts of interest Transcript Name Sequence ID No.W60282_PEA_1_T11 34

TABLE 296 Segments of interest Segment Name Sequence ID No.W60282_PEA_1_node_10 434 W60282_PEA_1_node_18 435 W60282_PEA_1_node_22436 W60282_PEA_1_node_5 437 W60282_PEA_1_node_21 438 W60282_PEA_1_node_8439

TABLE 297 Proteins of interest Protein Name Sequence ID No.W60282_PEA_1_P14 1312

These sequences are variants of the known protein Kallikrein 11precursor (SwissProt accession identifier KLKB_HUMAN; known alsoaccording to the synonyms EC 3.4.21.-; Hippostasin; Trypsin-likeprotease), SEQ ID NO: 1428, referred to herein as the previously knownprotein.

Protein Kallikrein 11 precursor (SEQ ID NO:1428) is known or believed tohave the following function(s): Possible multifunctional protease.Efficiently cleaves bz-Phe-Arg-4-methylcoumaryl-7-amide, a kallikreinsubstrate, and weakly cleaves other substrates for kallikrein andtrypsin. The sequence for protein Kallikrein 11 precursor is given atthe end of the application, as “Kallikrein 11 precursor amino acidsequence”. Protein Kallikrein 11 precursor localization is believed tobe Secreted.

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: proteolysis and peptidolysis,which are annotation(s) related to Biological Process; and chymotrypsin;trypsin; serine-type peptidase; hydrolase, which are annotation(s)related to Molecular Function.

The GO assignment relies on information from one or more of theSwissProt/TremB1Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

As noted above, cluster W60282 features 1 transcript(s), which werelisted in Table 295 above. These transcript(s) encode for protein(s)which are variant(s) of protein Kallikrein 11 precursor (SEQ IDNO:1428). A description of each variant protein according to the presentinvention is now provided.

Variant protein W60282_PEA_(—)1_P14 (SEQ ID NO:1312) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) W60282_PEA_(—)1_T11 (SEQ IDNO:34). An alignment is given to the known protein (Kallikrein 11precursor (SEQ ID NO:1428)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between W60282_PEA_(—)1_P14 (SEQ ID NO:1312) andQ8IXD7 (SEQ ID NO:1705):

1. An isolated chimeric polypeptide encoding for W60282_PEA_(—)1_P14(SEQ ID NO:1312), comprising a first amino acid sequence being at least90% homologous toMRILQLILLALATGLVGGETRIIKGFECKPHSQPWQAALFEKTRLLCGATLIAPRWLLTAAHCLKPcorresponding to amino acids 1-66 of Q8IXD7 (SEQ ID NO:1705), which alsocorresponds to amino acids 1-66 of W60282_PEA_(—)1_P14 (SEQ ID NO:1312),and a second amino acid sequence being at least 70%, optionally at least80%, preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceTPASHLAMRQHHHH (SEQ ID NO: 1771) corresponding to amino acids 67-80 ofW60282 _PEA_(—)1 _P14 (SEQ ID NO:1312), wherein said first and secondamino acid sequences and in a sequential order.

2. An isolated polypeptide encoding for a tail of W60282_PEA_(—)1_P14(SEQ ID NO:1312), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence TPASHLAMRQHHHH (SEQ ID NO: 1771) inW60282_PEA_(—)1_P14 (SEQ ID NO:1312).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein W60282_PEA_(—)1_P14 (SEQ ID NO:1312) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 298, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein W60282_PEA_(—)1_P14 (SEQ ID NO:1312) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 298 Amino acid mutations SNP position(s) Alternative Previously onamino acid amino known sequence acid(s) SNP? 17 G -> E Yes 41 E -> K No

Variant protein W60282_PEA_(—)1_P14 (SEQ ID NO:1312) is encoded by thefollowing transcript(s): W60282_PEA_(—)1_T11 (SEQ ID NO:34), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript W60282_PEA_(—)1_T11 (SEQ ID NO:34) is shown inbold; this coding portion starts at position 705 and ends at position944. The transcript also has the following SNPs as listed in Table 299(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinW60282_PEA_(—)1_P14 (SEQ ID NO:1312) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 299 Nucleic acid SNPs SNP position Alternative Previously onnucleotide nucleic known sequence acid SNP? 219 A -> G Yes 702 G -> AYes 754 G -> A Yes 825 G -> A No 1289 A -> G Yes

As noted above, cluster W60282 features 6 segment(s), which were listedin Table 296 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster W60282_PEA_(—)1_node_(—)10 (SEQ ID NO:1228) according tothe present invention is supported by 45 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): W60282_PEA_(—)1_T11 (SEQ IDNO:34). Table 300 below describes the starting and ending position ofthis segment on each transcript.

TABLE 300 Segment location on transcripts Segment Segment startingending Transcript name position position W60282_PEA_1_T11 (SEQ ID NO:34) 745 901

Segment cluster W60282_PEA_(—)1_node_(—)18 (SEQ ID NO:1229) according tothe present invention is supported by 49 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): W60282_PEA_(—)1_T11 (SEQ IDNO:34). Table 301 below describes the starting and ending position ofthis segment on each transcript.

TABLE 301 Segment location on transcripts Segment Segment startingending Transcript name position position W60282_PEA_1_T11 (SEQ ID NO:34) 902 1038

Segment cluster W60282_PEA_(—)1_node_(—)22 (SEQ ID NO:1230) according tothe present invention is supported by 67 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): W60282_PEA_(—)1_T11 (SEQ IDNO:34). Table 302 below describes the starting and ending position ofthis segment on each transcript.

TABLE 302 Segment location on transcripts Segment Segment startingending Transcript name position position W60282_PEA_1_T11 (SEQ ID NO:34) 1072 1507

Segment cluster W60282_PEA_(—)1_node_(—)5 (SEQ ID NO:1231) according tothe present invention is supported by 20 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): W60282_PEA_(—)1_T11 (SEQ IDNO:34). Table 303 below describeds the starting and ending position ofthis segment on each transcript.

TABLE 303 Segment location on transcripts Segment Segment startingending Transcript name position position W60282_PEA_1_T11 (SEQ ID NO:34) 1 669

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster W60282_PEA_(—)1_node_(—)21 (SEQ ID NO:1232) according tothe present invention is supported by 48 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): W60282_PEA_(—)1_T11 (SEQ IDNO:34). Table 304 below describes the starting and ending position ofthis segment on each transcript.

TABLE 304 Segment location on transcripts Segment Segment startingending Transcript name position position W60282_PEA_1_T11 (SEQ ID NO:34) 1039 1071

Segment cluster W60282_PEA_(—)1_node_(—)8 (SEQ ID NO:1233) according tothe present invention is supported by 39 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): W60282_PEA_(—)1_T11 (SEQ IDNO:34). Table 305 below describes the starting and ending position ofthis segment on each transcript.

TABLE 305 Segment location on transcripts Segment Segment startingending Transcript name position position W60282_PEA_1_T11 (SEQ ID NO:34) 670 744

Variant protein alignment to the previously known protein:

Sequence name: /tmp/rL7Wdc5hYg/eLOAfKIgqD:KLKB_HUMAN (SEQ ID NO: 1428)Sequence documentation: Alignment of: W60282_PEA_1_P14 (SEQ ID NO: 1312)× KLKB_HUMAN (SEQ ID NO: 1428) . . . Alignment segment 1/1: Quality:645.00 Escore: 0 Matching length: 72 Total length: 72 Matching PercentSimilarity: 94.44 Matching Percent Identity: 94.44 Total PercentSimilarity: 94.44 Total Percent Identity: 94.44 Gaps: 0 Alignment:

Sequence name: /tmp/rL7Wdc5hYg/eLOAfKIgqD:Q8IXD7 (SEQ ID NO: 1705)Sequence documentation: Alignment of: W60282_PEA_1_P14 (SEQ ID NO: 1312)× Q8IXD7 (SEQ ID NO: 1705) Alignment segment 1/1: Quality: 642.00Escore: 0 Matching length: 66 Total length: 66 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:

Description for Cluster Z41644

Cluster Z41644 features 1 transcript(s) and 21 segment(s) of interest,the names for which are given in

Tables 306 and 307, respectively, the sequences themselves are given atthe end of the application. The selected protein variants are given intable 308.

TABLE 306 Transcripts of interest Transcript Name Sequence ID No.Z41644_PEA_1_T5 35

TABLE 307 Segments of interest Segment Name Sequence ID No.Z41644_PEA_1_node_0 440 Z41644_PEA_1_node_11 441 Z41644_PEA_1_node_12442 Z41644_PEA_1_node_15 443 Z41644_PEA_1_node_20 444Z41644_PEA_1_node_24 445 Z41644_PEA_1_node_1 446 Z41644_PEA_1_node_10447 Z41644_PEA_1_node_13 448 Z41644_PEA_1_node_16 449Z41644_PEA_1_node_17 450 Z41644_PEA_1_node_19 451 Z41644_PEA_1_node_2452 Z41644_PEA_1_node_21 453 Z41644_PEA_1_node_22 454Z41644_PEA_1_node_23 455 Z41644_PEA_1_node_25 456 Z41644_PEA_1_node_3457 Z41644_PEA_1_node_4 458 Z41644_PEA_1_node_6 459 Z41644_PEA_1_node_9460

TABLE 308 Proteins of interest Protein Name Sequence ID No.Z41644_PEA_1_P10 1313

These sequences are variants of the known protein Small induciblecytokine B14 precursor (SwissProt accession identifier SZ14_HUMAN; knownalso according to the synonyms CXCL14; Chemokine BRAK), SEQ ID NO:1429,referred to herein as the previously known protein.

The sequence for protein Small inducible cytokine B14 precursor (SEQ IDNO:1429) is given at the end of the application, as “Small induciblecytokine B14 precursor amino acid sequence”. Protein Small induciblecytokine B14 precursor localization is believed to be Secreted.

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: chemotaxis; signal transduction;cell-cell signaling, which are annotation(s) related to BiologicalProcess; and chemokine, which are annotation(s) related to MolecularFunction.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

Cluster Z41644 can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 27 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 27 and Table 309. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions: lungmalignant tumors, breast malignant tumors and pancreas carcinoma.

TABLE 309 Normal tissue distribution Name of Tissue Number bone 45 brain62 colon 327 epithelial 179 general 104 head and neck 10 kidney 219 lung6 lymph nodes 37 breast 87 bone marrow 0 muscle 20 ovary 36 pancreas 0prostate 78 skin 591 stomach 109 Thyroid 386 uterus 218

TABLE 310 P values and ratios for expression in cancerous tissue Name ofTissue P1 P2 SP1 R3 SP2 R4 bone 4.9e−01 8.5e−01 1.8e−01 1.9 5.3e−01 1.0brain 6.7e−01 8.0e−01 9.1e−01 0.6 9.9e−01 0.4 colon 6.4e−01 7.7e−019.7e−01 0.4 1 0.3 epithelial 4.1e−01 9.4e−01 9.6e−01 0.7 1 0.4 general1.5e−01 9.4e−01 1.8e−01 1.0 1 0.5 head and neck 1.9e−01 3.3e−01 4.6e−012.8 7.5e−01 1.5 kidney 7.7e−01 8.2e−01 7.0e−01 0.7 9.5e−01 0.5 lung2.2e−01 5.0e−01 1.3e−04 8.7 8.1e−03 4.1 lymph nodes 6.3e−01 8.7e−016.3e−01 1.2 9.2e−01 0.6 breast 4.0e−01 6.5e−01 3.9e−04 3.5 2.9e−02 1.9bone marrow 1 6.7e−01 1 1.0 5.3e−01 1.9 muscle 5.2e−01 6.1e−01 2.7e−013.2 6.3e−01 1.2 ovary 6.7e−01 7.1e−01 7.6e−01 1.0 8.6e−01 0.8 pancreas2.2e−02 2.3e−02 5.7e−03 7.8 1.6e−03 8.2 prostate 8.8e−01 9.0e−01 8.3e−010.6 9.3e−01 0.5 skin 5.9e−01 6.9e−01 2.3e−01 0.3 1 0.0 stomach 6.1e−018.9e−01 8.1e−01 0.7 9.9e−01 0.4 Thyroid 7.0e−01 7.0e−01 9.9e−01 0.49.9e−01 0.4 uterus 5.3e−01 8.2e−01 9.5e−01 0.5 1 0.3

As noted above, cluster Z41644 features 1 transcript(s), which werelisted in Table 306 above. These transcript(s) encode for protein(s)which are variant(s) of protein Small inducible cytokine B14 precursor(SEQ ID NO:1429). A description of each variant protein according to thepresent invention is now provided.

Variant protein Z41644_PEA_(—)1_P10 (SEQ ID NO:1313) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) Z41644_PEA_(—)1_T5 (SEQ IDNO:35). An alignment is given to the known protein (Small induciblecytokine B14 precursor (SEQ ID NO:1429)) at the end of the application.One or more alignments to one or more previously published proteinsequences are given at the end of the application. A brief descriptionof the relationship of the variant protein according to the presentinvention to each such aligned protein is as follows:

Comparison report between Z41644_PEA_(—)1_P10 (SEQ ID NO:1313) andSZ14_HUMAN (SEQ ID NO:1429):

1. An isolated chimeric polypeptide encoding for Z41644_PEA_(—)1_P10(SEQ ID NO:1313), comprising a first amino acid sequence being at least90% homologous toMRLLAAALLLLLLALYTARVDGSKCKCSRKGPKIRYSDVKKLEMKPKYPHCEEKMVIITKSVSRYRGQEHCLHPKLQSTKRFIKWYNAWNEKRR corresponding to amino acids 1-95 ofSZ14_HUMAN (SEQ ID NO:1429), which also corresponds to amino acids 1-95of Z41644_PEA_(—)1_P10 (SEQ ID NO:1313), and a second amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequenceYAPPLLTFLPTRPSCGSQDGKGPPHQVI (SEQ ID NO: 1772) corresponding to aminoacids 96-123 of Z41644_PEA_(—)1_P10 (SEQ ID NO:1313), wherein said firstand second amino acid sequences are contiguous and in a sequentialorder.

2. An isolated polypeptide encoding for a tail of Z41644_PEA_(—)1_P10(SEQ ID NO:1313), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence YAPPLLTFLPTRPSCGSQDGKGPPHQVI (SEQ ID NO:1772) in Z41644_PEA_(—)1_P10 (SEQ ID NO:1313).

Comparison report between Z41644_PEA_(—)1_P10 (SEQ ID NO:1313) andQ9NS21 (SEQ ID NO: 1706):

1. An isolated chimeric polypeptide encoding for Z41644_PEA_(—)1_P10(SEQ ID NO:1313), comprising a first amino acid sequence being at least90% homologous toMRLLAAALLLLLLALYTARVDGSKCKCSRKGPKIRYSDVKKLEMKPKYPHCEEKMVIITTKSVSRYRGQEHCLHPKLQSTKRFIKWYNAWNEKRR corresponding to amino acids 13-107 of Q9NS21(SEQ ID NO:1706), which also corresponds to amino acids 1-95 ofZ41644_PEA_(—)1_P10 (SEQ ID NO:1313), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence YAPPLLTFLPTRPSCGSQDGKGPPHQVI (SEQID NO: 1772) corresponding to amino acids 96-123 of Z41644_PEA_(—)1_P10(SEQ ID NO:1313), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of Z41644_PEA_(—)1_P10(SEQ ID NO:1313), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence YAPPLLTFLPTRPSCGSQDGKGPPHQVI (SEQ ID NO:1772) in Z41644_PEA_(—)1_P10 (SEQ ID NO:1313).

Comparison report between Z41644_PEA_(—)1_P10 (SEQ ID NO:1313) andAAQ89265 (SEQ ID NO:781):

1. An isolated chimeric polypeptide encoding for Z41644_PEA_(—)1_P10(SEQ ID NO:1313), comprising a first amino acid sequence being at least90% homologous toMRLLAAALLLLLALYTARVDGSKCKCSRKGPKIRYSDVKKLEMKPKYPHCEEKMVIITTKSVSRYRGQEHCLHPKLQSTKRFIKWYNAWNEKRR corresponding to amino acids 13-107 ofAAQ89265 (SEQ ID NO:781) , which also corresponds to amino acids 1-95 ofZ41644_PEA_(—)1_P10 (SEQ ID NO:1313), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence YAPPLLTFLPTRPSCGSQDGKGPPHQVI (SEQID NO: 1772) corresponding to amino acids 96-123 of Z41644_PEA_(—)1 P10(SEQ ID NO:1313), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of Z41644_PEA_(—)1_P10(SEQ ID NO:1313), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence YAPPLLTFLPTRPSCGSQDGKGPPHQVI (SEQ ID NO:1772) in Z41644_PEA_(—)1_P10 (SEQ ID NO:1313).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein Z41644_PEA_(—)1_P10 (SEQ ID NO:1313) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 311, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein Z41644_PEA_(—)1_P10 (SEQ ID NO:1313) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 311 Amino acid mutations SNP position(s) on amino acid Alternativesequence amino acid(s) Previously known SNP? 32 P -> H Yes 64 S -> No 80T -> A No 80 T -> P No

Variant protein Z41644_PEA_(—)1_P10 (SEQ ID NO:1313) is encoded by thefollowing transcript(s): Z41644_PEA_(—)1_T5 (SEQ ID NO:35), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript Z41644_PEA_(—)1_T5 (SEQ ID NO:35) is shown inbold; this coding portion starts at position 744 and ends at position1112. The transcript also has the following SNPs as listed in Table 312(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinZ41644_PEA_(—)1_P10 (SEQ ID NO:1313) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 312 Nucleic acid SNPs SNP position on nucleotide AlternativePreviously sequence nucleic acid known SNP? 102 A -> G Yes 572 C -> No3707 C -> T Yes 3735 C -> T Yes 4079 G -> A No 4123 G -> A Yes 4233 A ->G Yes 4328 C -> No 4350 A -> G Yes 4376 G -> A Yes 4390 A -> G Yes 4619G -> T Yes 838 C -> A Yes 4754 C -> T No 4757 C -> A No 4794 T -> G No4827 G -> No 934 C -> No 981 A -> C No 981 A -> G No 1817 A -> C Yes2546 T -> No 2684 T -> A No 2885 T -> C Yes

As noted above, cluster Z41644 features 21 segment(s), which were listedin Table 307 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster Z41644_PEA_(—)1_node_(—)0 (SEQ ID NO:1234) according tothe present invention is supported by 53 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z41644_PEA_(—)1_T5 (SEQ ID NO:35).Table 313 below describes the starting and ending position of thissegment on each transcript.

TABLE 313 Segment location on transcripts Segment Segment Transcriptname starting position ending position Z41644_PEA_1_T5 (SEQ ID 1 616 NO:35)

Segment cluster Z41644_PEA_(—)1_node_(—)11 (SEQ ID NO:1235) according tothe present invention is supported by 9 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z41644_PEA_(—)1_T5 (SEQ ID NO:35).Table 314 below describes the starting and ending position of thissegment on each transcript.

TABLE 314 Segment location on transcripts Segment starting Segmentending Transcript name position position Z41644_PEA_1_T5 (SEQ ID 10282089 NO: 35)

Segment cluster Z41644_PEA_(—)1_node_(—)12 (SEQ ID NO:1236) according tothe present invention is supported by 6 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z41644_PEA_(—)1_T5 (SEQ ID NO:35).Table 315 below describes the starting and ending position of thissegment on each transcript.

TABLE 315 Segment location on transcripts Segment Segment Transcriptname starting position ending position Z41644_PEA_1_T5 2090 2350 (SEQ IDNO: 35)

Segment cluster Z41644_PEA_(—)1_node_(—)15 (SEQ ID NO:1237) according tothe present invention is supported by 23 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z41644_PEA_(—)1_T5 (SEQ ID NO:35).Table 316 below describes the starting and ending position of thissegment on each transcript.

TABLE 316 Segment location on transcripts Segment Segment startingending Transcript name position position Z41644_PEA_1_T5 (SEQ ID 23683728 NO: 35)

Segment cluster Z41644_PEA_(—)1_node_(—)20 (SEQ ID NO:1238) according tothe present invention is supported by 260 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z41644_PEA_(—)1_T5 (SEQ ID NO:35).Table 317 below describes the starting and ending position of thissegment on each transcript.

TABLE 317 Segment location on transcripts Segment Transcript namestarting position Segment ending position Z41644_PEA_1_T5 3938 4506 (SEQID NO: 35)

Segment cluster Z41644_PEA_(—)1_node_(—)24 (SEQ ID NO:1239) according tothe present invention is supported by 185 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z41644_PEA_(—)1_T5 (SEQ ID NO:35).Table 318 below describes the starting and ending position of thissegment on each transcript.

TABLE 318 Segment location on transcripts Segment Transcript nameSegment starting position ending position Z41644_PEA_1_T5 (SEQ 4637 4799ID NO: 35)

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster Z41644_PEA_(—)1_node_(—)1 (SEQ ID NO:1240) according tothe present invention is supported by 53 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z41644_PEA_(—)1_T5 (SEQ ID NO:35).Table 39 below describes the starting and ending position of thissegment on each transcript.

TABLE 319 Segment location on transcripts Segment Transcript nameSegment starting position ending position Z41644_PEA_1_T5 (SEQ 617 697ID NO: 35)

Segment cluster Z41644_PEA_(—)1_node_(—)10 (SEQ ID NO:1241) according tothe present invention is supported by 138 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z41644_PEA_(—)1_T5 (SEQ ID NO:35).Table 320 below describes the starting and ending position of thissegment on each transcript.

TABLE 320 Segment location on transcripts Segment Transcript nameSegment starting position ending position Z41644_PEA_1_T5 (SEQ 972 1027ID NO: 35)

Segment cluster Z41644_PEA_(—)1_node_(—)13 (SEQ ID NO:1242) according tothe present invention can be found in the following transcript(s):Z41644_PEA_(—)1_T5 (SEQ ID NO:35). Table 321 below describes thestarting and ending position of this segment on each transcript.

TABLE 321 Segment location on transcripts Segment Transcript nameSegment starting position ending position Z41644_PEA_1_T5 (SEQ 2351 2367ID NO: 35)

Segment cluster Z41644_PEA_(—)1_node_(—)16 (SEQ ID NO:1243) according tothe present invention is supported by 152 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z41644_PEA_(—)1_T5 (SEQ ID NO:35).Table 322 below describes the starting and ending position of thissegment on each transcript.

TABLE 322 Segment location on transcripts Segment Transcript nameSegment starting position ending position Z41644_PEA_1_T5 (SEQ 3729 3809ID NO: 35)

Segment cluster Z41644_PEA_(—)1_node_(—)17 (SEQ ID NO:1244) according tothe present invention can be found in the following transcript(s):Z41644_PEA_(—)1_T5 (SEQ ID NO:35). Table 323 below describes thestarting and ending position of this segment on each transcript.

TABLE 323 Segment location on transcripts Segment Transcript nameSegment starting position ending position Z41644_PEA_1_T5 (SEQ 3810 3829ID NO: 35)

Segment cluster Z41644_PEA_(—)1_node_(—)19 (SEQ ID NO:1245) according tothe present invention is supported by 112 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z41644_PEA_(—)1_T5 (SEQ ID NO:35).Table 324 below describes the starting and ending position of thissegment on each transcript.

TABLE 324 Segment location on transcripts Segment Transcript nameSegment starting position ending position Z41644_PEA_1_T5 (SEQ 3830 3937ID NO: 35)

Segment cluster Z41644_PEA_(—)1_node_(—)2 (SEQ ID NO:1246) according tothe present invention is supported by 58 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z41644_PEA_(—)1_T5 (SEQ ID NO:35).Table 325 below describes the starting and ending position of thissegment on each transcript.

TABLE 325 Segment location on transcripts Segment Transcript nameSegment starting position ending position Z41644_PEA_1_T5 (SEQ 698 737ID NO: 35)

Segment cluster Z41644_PEA_(—)1_node_(—)21 (SEQ ID NO:1247) according tothe present invention can be found in the following transcript(s):Z41644_PEA_(—)1_T5 (SEQ ID NO:35). Table 326 below describes thestarting and ending position of this segment on each transcript.

TABLE 326 Segment location on transcripts Segment Transcript nameSegment starting position ending position Z41644_PEA_1_T5 (SEQ 4507 4529ID NO: 35)

Segment cluster Z41644_PEA_(—)1_node_(—)22 (SEQ ID NO:1248) according tothe present invention is supported by 164 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z41644_PEA_(—)1_T5 (SEQ ID NO:35).Table 327 below describes the starting and ending position of thissegment on each transcript.

TABLE 327 Segment location on transcripts Segment Transcript nameSegment starting position ending position Z41644_PEA_1_T5 (SEQ 4530 4582ID NO: 35)

Segment cluster Z41644_PEA_(—)1_node_(—)23 (SEQ ID NO:1249) according tothe present invention is supported by 169 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z41644_PEA_(—)1_T5 (SEQ ID NO:35).Table 328 below describes the starting and ending position of thissegment on each transcript.

TABLE 328 Segment location on transcripts Segment Transcript nameSegment starting position ending position Z41644_PEA_1_T5 (SEQ 4583 4636ID NO: 35)

Segment cluster Z41644_PEA_(—)1_node_(—)25 (SEQ ID NO:1250) according tothe present invention is supported by 138 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z41644_PEA_(—)1_T5 (SEQ ID NO:35).Table 329 below describes the starting and ending position of thissegment on each transcript.

TABLE 329 Segment location on transcripts Segment Transcript nameSegment starting position ending position Z41644_PEA_1_T5 (SEQ 4800 4902ID NO: 35)

Segment cluster Z41644_PEA_(—)1_node_(—)3 (SEQ IL) NO:1251) according tothe present invention is supported by 75 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z41644_PEA_(—)1_T5 (SEQ ID NO:35).Table 330 below describes the starting and ending position of thissegment on each transcript.

TABLE 330 Segment location on transcripts Segment Transcript nameSegment starting position ending position Z41644_PEA_1_T5 (SEQ 738 773ID NO: 35)

Segment cluster Z41644_PEA_(—)1_node_(—)4 (SEQ ID NO:1252) according tothe present invention is supported by 61 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z41644_PEA_(—)1_T5 (SEQ ID NO:35).Table 331 below describes the starting and ending position of thissegment on each transcript.

TABLE 331 Segment location on transcripts Segment Transcript nameSegment starting position ending position Z41644_PEA_1_T5 (SEQ 774 807ID NO: 35)

Segment cluster Z41644_PEA_(—)1_node_(—)6 (SEQ ID NO:1253) according tothe present invention is supported by 101 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z41644PEA_(—)1_T5 (SEQ ID NO:35).Table 332 below describes the starting and ending position of thissegment on each transcript.

TABLE 332 Segment location on transcripts Segment Transcript nameSegment starting position ending position Z41644_PEA_1_T5 (SEQ 808 913ID NO: 35)

Segment cluster Z41644_PEA_(—)1_node_(—)9 (SEQ ID NO:1254) according tothe present invention is supported by 134 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z41644_PEA_(—)1_T5 (SEQ ID NO:35).Table 333 below describes the starting and ending position of thissegment on each transcript.

TABLE 333 Segment location on transcripts Segment Transcript nameSegment starting position ending position Z41644_PEA_1_T5 (SEQ 914 971ID NO: 35)

Variant protein alignment to the previously known protein:

Sequence name: /tmp/p5SSvhT9Xp/HQeIMsUrfm:SZ14_HUMAN (SEQ ID NO: 1429)Sequence documentation: Alignment of: Z41644_PEA_1_P10 (SEQ ID NO: 1313)× SZ14_HUMAN (SEQ ID NO: 1429) . . . Alignment segment 1/1: Quality:953.00 Escore: 0 Matching length: 95 Total length: 95 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:

Sequence name: /tmp/p5SSvhT9Xp/HQeIMsUrfm:Q9NS21 (SEQ ID NO: 1706)Sequence documentation: Alignment of: Z41644_PEA_1_P10 (SEQ ID NO: 1313)× Q9NS21 (SEQ ID NO: 1706) Alignment segment 1/1: Quality: 957.00Escore: 0 Matching length: 96 Total length: 96 Matching PercentSimilarity: 100.00 Matching Percent Identity: 98.96 Total PercentSimilarity: 100.00 Total Percent Identity: 98.96 Gaps: 0 Alignment:

Sequence name: /tmp/p5SSvhT9Xp/HQeIMsUrfm:AAQ89265 (SEQ ID NO: 781)Sequence documentation: Alignment of: Z41644_PEA_1_P10 (SEQ ID NO: 1313)× AAQ89265 (SEQ ID NO: 781) Alignment segment 1/1: Quality: 953.00Escore: 0 Matching length: 95 Total length: 95 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:

Description for Cluster Z44808

Cluster Z44808 features 5 transcript(s) and 21 segment(s) of interest,the names for which are given in Tables 334 and 335, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 336.

TABLE 334 Transcripts of interest Transcript Name Sequence ID No.Z44808_PEA_1_T11 36 Z44808_PEA_1_T4 37 Z44808_PEA_1_T5 38Z44808_PEA_1_T8 39 Z44808_PEA_1_T9 40

TABLE 335 Segments of interest Segment Name Sequence ID No.Z44808_PEA_1_node_0 461 Z44808_PEA_1_node_16 462 Z44808_PEA_1_node_2 463Z44808_PEA_1_node_24 464 Z44808_PEA_1_node_32 465 Z44808_PEA_1_node_33466 Z44808_PEA_1_node_36 467 Z44808_PEA_1_node_37 468Z44808_PEA_1_node_41 469 Z44808_PEA_1_node_11 470 Z44808_PEA_1_node_13471 Z44808_PEA_1_node_18 472 Z44808_PEA_1_node_22 473Z44808_PEA_1_node_26 474 Z44808_PEA_1_node_30 475 Z44808_PEA_1_node_34476 Z44808_PEA_1_node_35 477 Z44808_PEA_1_node_39 478Z44808_PEA_1_node_4 479 Z44808_PEA_1_node_6 480 Z44808_PEA_1_node_8 481

TABLE 336 Proteins of interest Protein Name Sequence ID No.Z44808_PEA_1_P5 1314 Z44808_PEA_1_P6 1315 Z44808_PEA_1_P7 1316Z44808_PEA_1_P11 1317

These sequences are variants of the known protein SPARC related modularcalcium-binding protein 2 precursor (SwissProt accession identifierSMO2_HUMAN; known also according to the synonyms Secreted modularcalcium-binding protein 2; SMOC-2; Smooth muscle-associated protein 2;SMAP-2; MSTP 117), SEQ ID NO: 1430, referred to herein as the previouslyknown protein.

Protein SPARC related modular calcium-binding protein 2 precursor (SEQID NO:1430) is known or believed to have the following function(s):calcium binding. The sequence for protein SPARC related modularcalcium-binding protein 2 precursor is given at the end of theapplication, as “SPARC related modular calcium-binding protein 2precursor amino acid sequence”. Known polymorphisms for this sequenceare as shown in Table 337.

TABLE 337 Amino acid mutations for Known Protein SNP position(s) onamino acid sequene Comment 169-170 KT −> TR 212 S −> P 429-446TPRGHAESTSNRQPRXQG −> RSKRNL 434 A −> V 439 N −> Y

Protein SPARC related modular calcium-binding protein 2 precursor (SEQID NO:1430) localization is believed to be Secreted.

Cluster Z44808 can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 28 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 28 and Table 338. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions:colorectal cancer, lung cancer and pancreas carcinoma.

TABLE 338 Normal tissue distribution Name of Tissue Number bladder 123bone 304 brain 18 colon 0 epithelial 40 general 37 kidney 2 lung 0breast 61 ovary 116 pancreas 0 prostate 128 stomach 36 uterus 195

TABLE 339 P values and ratios for expression in cancerous tissue Name ofTissue P1 P2 SP1 R3 SP2 R4 bladder 6.8e−01 7.6e−01 7.7e−01 0.8 9.1e−010.6 bone 7.0e−01 8.8e−01 9.9e−01 0.3 1 0.2 brain 6.8e−01 7.2e−01 3.0e−022.6 1.7e−01 1.6 colon 9.2e−03 1.3e−02 1.2e−01 3.6 1.6e−01 3.1 epithelial2.1e−02 4.0e−01 1.0e−04 1.9 2.7e−01 1.0 general 2.6e−02 7.2e−01 4.9e−071.9 3.0e−01 1.0 kidney 7.3e−01 8.1e−01 1 1.0 1 1.0 lung 4.0e−03 1.8e−028.0e−04 12.2 2.1e−02 6.0 breast 4.8e−01 6.1e−01 9.8e−02 2.0 3.9e−01 1.2ovary 8.1e−01 8.3e−01 9.1e−01 0.6 9.7e−01 0.5 pancreas 1.2e−01 2.1e−011.0e−03 6.5 5.9e−03 4.6 prostate 8.4e−01 8.9e−01 9.0e−01 0.6 9.8e−01 0.4stomach 5.0e−01 8.7e−01 9.6e−04 1.5 1.9e−01 0.8 uterus 6.7e−01 7.9e−019.2e−01 0.5 1 0.3

As noted above, cluster Z44808 features 5 transcript(s), which werelisted in Table 334 above. These transcript(s) encode for protein(s)which are variant(s) of protein SPARC related modular calcium-bindingprotein 2 precursor (SEQ ID NO:1430). A description of each variantprotein according to the present invention is now provided.

Variant protein Z44808_PEA_(—)1_P5 (SEQ ID NO:1314) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) Z44808_PEA_(—)1_T4 (SEQ IDNO:37). An alignment is given to the known protein (SPARC relatedmodular calcium-binding protein 2 precursor (SEQ ID NO:1430)) at the endof the application. One or more alignments to one or more previouslypublished protein sequences are given at the end of the application. Abrief description of the relationship of the variant protein accordingto the present invention to each such aligned protein is as follows:

Comparison report between Z44808_PEA_(—)1_P5 (SEQ ID NO:1314) andSMO2_HUMAN (SEQ ID NO:1430):

1. An isolated chimeric polypeptide encoding for Z44808_PEA_(—)1_P5 (SEQID NO:1314), comprising a first amino acid sequence being at least 90%homologous toMLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPNDNVVIPECAHGGLYKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQLQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEERVVHWYFKLLDKNSSGDIGKKEIKPFKFLRKKSKPKKCVKKFVEYCDVNNDKSISVQELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ corresponding to amino acids 1-441 of SMO2_HUMAN(SEQ ID NO:1430), which also corresponds to amino acids 1-441 ofZ44808_PEA_(—)1_P5 (SEQ ID NO:1314), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence DAMVVSSRPKATTHRKSRTLSRR (SEQ ID NO:1751) corresponding to amino acids 442-464 of

Z44808_PEA_(—)1_P5 (SEQ ID NO:1314), wherein said first and second aminoacid sequences are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of Z44808_PEA_(—)1_P5(SEQ ID NO:1314), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence DAMVVSSRPKATTHRKSRTLSRR (SEQ ID NO: 1751) inZ44808_PEA_(—)1_P5 (SEQ ID NO:1314).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein Z44808_PEA_(—)1_P5 (SEQ ID NO:1314) is encoded by thefollowing transcript(s): Z44808_PEA_(—)1_T4 (SEQ ID NO:37), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript Z44808_PEA_(—)1_T4 (SEQ ID NO:37) is shown inbold; this coding portion starts at position 586 and ends at position1977. The transcript also has the following SNPs as listed in Table 340(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinZ44808_PEA_(—)1_P5 (SEQ ID NO:1314) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 340 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 549 A -> G No 648 T -> G No4403 G -> T No 4456 G -> A Yes 4964 G -> C Yes 1025 C -> No 1677 T -> CNo 2691 C -> T Yes 3900 T -> C No 3929 G -> A Yes 4099 G -> T Yes 4281 T-> C No 4319 G -> C Yes

Variant protein Z44808_PEA_(—)1_P6 (SEQ ID NO:1315) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) Z44808_PEA_(—)1_T5 (SEQ IDNO:38). An alignment is given to the known protein (SPARC relatedmodular calcium-binding protein 2 precursor (SEQ ID NO:1430)) at the endof the application. One or more alignments to one or more previouslypublished protein sequences are given at the end of the application. Abrief description of the relationship of the variant protein accordingto the present invention to each such aligned protein is as follows:

Comparison report between Z44808_PEA_(—)1_P6 (SEQ ID NO:1315) andSMO2_HUMAN (SEQ ID NO:1430):

1. An isolated chimeric polypeptide encoding for Z44808_PEA_(—)1_P6 (SEQID NO:1315), comprising a first amino acid sequence being at least 90%homologous toMLLPQLCWLPLLAGLLPPVPAQFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGLYKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQLQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEERVVHWYFKCLLDKNSSGDIGKKEIKPFKRFLRKKSPKKCVKCKFVEYCDVNNDKSISVQELMGCLGVAKEDGKADTKKRH corresponding to amino acids 1-428 of SMO2_HUMAN (SEQ IDNO:1430), which also corresponds to amino acids 1-428 ofZ44808_PEA_(—)1_P6 (SEQ ID NO:1315), and a second being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence RSKRNL (SEQ ID NO: 1752) corresponding to aminoacids 429-434 of Z44808_PEA_(—)1_P6 (SEQ ID NO:1315), wherein said firstand second amino acid sequences are contiguous and in a sequentialorder.

2. An isolated polypeptide encoding for a tail of Z44808_PEA_(—)1_P6(SEQ ID NO:1315), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence RSKRNL (SEQ ID NO: 1752) inZ44808_PEA_(—)1_P6 (SEQ ID NO:1315).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein Z44808_PEA_(—)1_P6 (SEQ ID NO:1315) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 341, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein Z44808_PEA_(—)1_P6 (SEQ ID NO:1315) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 341 Amino acid mutations SNP position(s) on Alternative Previouslyamino acid sequence amino acid(s) known SNP? 147 A -> No

Variant protein Z44808_PEA_(—)1_P6 (SEQ ID NO:1315) is encoded by thefollowing transcript(s): Z44808_PEA_(—)1_T5 (SEQ ID NO:38), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript Z44808_PEA_(—)1_T5 (SEQ ID NO:38) is shown inbold; this coding portion starts at position 586 and ends at position1887. The transcript also has the following SNPs as listed in Table 342(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinZ44808_PEA_(—)1_P6 (SEQ ID NO:1315) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 342 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 549 A -> G No 648 T -> G No2866 G -> A Yes 3374 G -> C Yes 1025 C -> No 1677 T -> C No 2310 T -> CNo 2339 G -> A Yes 2509 G -> T Yes 2691 T -> C No 2729 G -> C Yes 2813 G-> T No

Variant protein Z44808_PEA_(—)1_P7 (SEQ ID NO:1316) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) Z44808_PEA_(—)1_T9 (SEQ IDNO:40). An alignment is given to the known protein (SPARC relatedmodular calcium-binding protein 2 precursor (SEQ ID NO:1430)) at the endof the application. One or more alignments to one or more previouslypublished protein sequences are given at the end of the application. Abrief description of the relationship of the variant protein accordingto the present invention to each such aligned protein is as follows:

Comparison report between Z44808_PEA_(—)1_P7 (SEQ ID NO:1316) andSMO2_HUMAN (SEQ ID NO:1430):

1. An isolated chimeric polypeptide encoding for Z44808_PEA_(—)1_P7 (SEQID NO:1316), comprising a first amino acid sequence being at least 90%homologous toMLLPQLCWLPLLAGLLPPVPAQFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKTDDAAAPALETQPQGDEEDIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGLYKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQLQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEERVVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNNDKSISVQELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQ corresponding to amino acids 1-441 of SMO2_HUMAN(SEQ ID NO:1430), which also corresponds to amino acids 1-441 ofZ44808_PEA_(—)1_P7 (SEQ ID NO: 1316), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence LLWLRGKVSFYCF (SEQ ID NO: 1753)corresponding to amino acids 442-454 of Z44808_PEA_(—)1_P7 (SEQ IDNO:1316), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of Z44808_PEA_(—)1_P7(SEQ ID NO:1316), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence LLWLRGKVSFYCF (SEQ ID NO: 1753) inZ44808_PEA_(—)1_P7 (SEQ ID NO:1316).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein Z44808_PEA_(—)1_P7 (SEQ ID NO:1316) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 343, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein Z44808_PEA_(—)1_P7 (SEQ ID NO:1316) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 343 Amino acid mutations SNP position(s) on Alternative Previouslyamino acid sequence amino acid(s) known SNP? 147 A -> No

Variant protein Z44808_PEA_(—)1_P7 (SEQ ID NO:1316) is encoded by thefollowing transcript(s): Z44808_PEA_(—)1_T9 (SEQ ID NO:40), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript Z44808_PEA_(—)1_T9 (SEQ ID NO:40) is shown inbold; this coding portion starts at position 586 and ends at position1947. The transcript also has the following SNPs as listed in Table 344(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinZ44808_PEA_(—)1_P7 (SEQ ID NO:1316) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 344 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 549 A -> G No 648 T -> G No1025 C -> No 1677 T -> C No 2169 C -> A Yes

Variant protein Z44808_PEA_(—)1_P11 (SEQ ID NO:1317) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) Z44808_PEA_(—)1_T11 (SEQ IDNO:36). The identification of this transcript was performed using anon-EST based method for identification of alternative splicing,described in the following reference: “Sorek Ret al., Genome Res. (2004)14:1617-23.” An alignment is given to the known protein (SPARC relatedmodular calcium-binding protein 2 precursor (SEQ ID NO:1430)) at the endof the application. One or more alignments to one or more previouslypublished protein sequences are given at the end of the application. Abrief description of the relationship of the variant protein accordingto the present invention to each such aligned protein is as follows:

Comparison report between Z44808_PEA_(—)1_P11 (SEQ ID NO:1317) andSMO2_HUMAN (SEQ ID NO:1430):

1. An isolated chimeric polypeptide encoding for Z44808_PEA_(—)1_P11(SEQ ID NO:1317), comprising a first amino acid sequence being at least90% homologous toMLLPQLCWLPLLAGLLPPVPAQKFSALTFLRVDQDKDKDCSLDCAGSPQKPLCASDGRTFLSRCEFQRAKCKDPQLEIAYRGNCKDVSRCVAERKYTQEQARKEFQQVFIPECNDDGTYSQVQCHSYTGYCWCVTPNGRPISGTAVAHKTPRCPGSVNEKLPQREGTGKT corresponding to amino acids 1-170 ofSMO2_HUMAN (SEQ ID NO:1430), which also corresponds to amino acids 1-170of Z44808_PEA_(—)1_P11 (SEQ ID NO:1317), and a second amino acidsequence being at least 90% homologous toDIASRYPTLWTEQVKSRQNKTNKNSVSSCDQEHQSALEEAKQPKNDNVVIPECAHGGLYKPVQCHPSTGYCWCVLVDTGRPIPGTSTRYEQPKCDNTARAHPAKARDLYKGRQLQGCPGAKKHEFLTSVLDALSTDMVHAASDPSSSSGRLSEPDPSHTLEERVVHWYFKLLDKNSSGDIGKKEIKPFKRFLRKKSKPKKCVKKFVEYCDVNNDKSISVQELMGCLGVAKEDGKADTKKRHTPRGHAESTSNRQPRKQG corresponding to aminoacids 188-446 of SMO2_HUMAN (SEQ ID NO:1430), which also corresponds toamino acids 171-429 of Z44808_PEA_(—)1_P11 (SEQ ID NO:1317), whereinsaid first and second amino acid sequences are contiguous and in asequential order.

2. An isolated chimeric polypeptide encoding for an edge portion ofZ44808_PEA_(—)1_P11 (SEQ ID NO:1317), comprising a polypeptide having alength “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise TD, having a structureas follows: a sequence starting from any of amino acid numbers 170−x to−170; and ending at any of amino acid numbers 171+((n−2)−x), in which xvaries from 0 to n−2.

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein Z44808_PEA_(—)1_P11 (SEQ ID NO:1317) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 345, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein Z44808_PEA_(—)1_P11 (SEQ ID NO:1317) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 345 Amino acid mutations SNP position(s) on Alternative Previouslyamino acid sequence amino acid(s) known SNP? 147 A -> No

Variant protein Z44808_PEA_(—)1_P11 (SEQ ID NO:1317) is encoded by thefollowing transcript(s): Z44808_PEA_(—)1_T11 (SEQ ID NO:36), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript Z44808PEA_(—)1_T11 (SEQ ID NO:36) is shown inbold; this coding portion starts at position 586 and ends at position1872. The transcript also has the following SNPs as listed in Table 346(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinZ44808_PEA_(—)1_P11 (SEQ ID NO:1317) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 346 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 549 A -> G No 648 T -> G No2720 G -> A Yes 3228 G -> C Yes 1025 C -> No 1626 T -> C No 2164 T -> CNo 2193 G -> A Yes 2363 G -> T Yes 2545 T -> C No 2583 G -> C Yes 2667 G-> T No

As noted above, cluster Z44808 features 21 segment(s), which were listedin Table 335 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster Z44808_PEA_(—)1_node_O (SEQ ID NO:1255) according to thepresent invention is supported by 29 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): Z44808_PEA_(—)1_T11 (SEQ ID NO:36),Z44808_PEA_(—)1_T4 (SEQ ID NO:37) , Z44808_PEA_(—)1_T5 (SEQ ID NO:38),Z44808_PEA_(—)1_T8 (SEQ ID NO:39) and Z44808_PEA_(—)1_T9 (SEQ ID NO:40).Table 347 below describes the starting and ending position of thissegment on each transcript.

TABLE 347 Segment location on transcripts Segment starting SegmentTranscript name position ending position Z44808_PEA_1_T11 (SEQ ID NO:36) 1 669 Z44808_PEA_1_T4 (SEQ ID NO: 37) 1 669 Z44808_PEA_1_T5 (SEQ IDNO: 38) 1 669 Z44808_PEA_1_T8 (SEQ ID NO: 39) 1 669 Z44808_PEA_1_T9 (SEQID NO: 40) 1 669

Segment cluster Z44808_PEA_(—)1_node_(—)16 (SEQ ID NO:1256) according tothe present invention is supported by 39 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z44808_PEA_(—)1_T11 (SEQ IDNO:36), Z44808_PEA_(—)1_T4 (SEQ ID NO:37) , Z44808_PEA_(—)1_T5 (SEQ IDNO:38), Z44808_PEA_(—)1_T8 (SEQ ID NO:39) and Z44808_PEA_(—)1_T9 (SEQ IDNO:40). Table 348 below describes the starting and ending position ofthis segment on each transcript.

TABLE 348 Segment location on transcripts Segment starting SegmentTranscript name position ending position Z44808_PEA_1_T11 (SEQ ID NO:36) 1172 1358 Z44808_PEA_1_T4 (SEQ ID NO: 37) 1223 1409 Z44808_PEA_1_T5(SEQ ID NO: 38) 1223 1409 Z44808_PEA_1_T8 (SEQ ID NO: 39) 1223 1409Z44808_PEA_1_T9 (SEQ ID NO: 40) 1223 1409

Segment cluster Z44808_PEA_(—)1_node_(—)2 (SEQ ID NO:1257) according tothe present invention is supported by 34 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z44808_PEA_(—)1_T11 (SEQ IDNO:36), Z44808_PEA_(—)1_T4 (SEQ ID NO:37) , Z44808_PEA_(—)1_T5 (SEQ IDNO:38), Z44808_PEA_(—)1_T8 (SEQ ID NO:39) and Z44808_PEA_(—)1_T9 (SEQ IDNO:40). Table 349 below describes the starting and ending position ofthis segment on each transcript.

TABLE 349 Segment location on transcripts Segment starting SegmentTranscript name position ending position Z44808_PEA_1_T11 (SEQ ID NO:36) 670 841 Z44808_PEA_1_T4 (SEQ ID NO: 37) 670 841 Z44808_PEA_1_T5 (SEQID NO: 38) 670 841 Z44808_PEA_1_T8 (SEQ ID NO: 39) 670 841Z44808_PEA_1_T9 (SEQ ID NO: 40) 670 841

Segment cluster Z44808_PEA_(—)1_node_(—)24 (SEQ ID NO:1258) according tothe present invention is supported by 52 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z44808_PEA_(—)1_T11 (SEQ IDNO:36), Z44808_PEA_(—)1 T4 (SEQ ID NO:37) , Z44808_PEA_(—)1_T5 (SEQ IDNO:38), Z44808_PEA_(—)1_T8 (SEQ ID NO:39) and Z44808_PEA_(—)1_T9 (SEQ IDNO:40). Table 350 below describes the starting and ending position ofthis segment on each transcript.

TABLE 350 Segment location on transcripts Segment starting SegmentTranscript name position ending position Z44808_PEA_1_T11 (SEQ ID NO:36) 1545 1819 Z44808_PEA_1_T4 (SEQ ID NO: 37) 1596 1870 Z44808_PEA_1_T5(SEQ ID NO: 38) 1596 1870 Z44808_PEA_1_T8 (SEQ ID NO: 39) 1596 1870Z44808_PEA_1_T9 (SEQ ID NO: 40) 1596 1870

Segment cluster Z44808_PEA_(—)1_node_(—)32 (SEQ ID NO:1259) according tothe present invention supported by 17 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): Z44808_PEA_(—)1_T4 (SEQ ID NO:37) andZ44808_PEA_(—)1_T8 (SEQ ID NO:39). Table 351 below describes thestarting and ending position of this segment on each transcript.

TABLE 351 Segment location on transcripts Segment starting SegmentTranscript name position ending position Z44808_PEA_1_T4 (SEQ ID NO: 37)1909 3593 Z44808_PEA_1_T8 (SEQ ID NO: 39) 1909 2397

Segment cluster Z44808_PEA_(—)1_node_(—)33 (SEQ ID NO:1260) according tothe present invention is supported by 133 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z44808_PEA_(—)1_T11 (SEQ IDNO:36), Z44808_PEA_(—)1 T4 (SEQ ID NO:37) and Z44808_PEA_(—)1_T5 (SEQ IDNO:38). Table 352 below describes the starting and ending position ofthis segment on each transcript.

TABLE 352 Segment location on transcripts Segment starting SegmentTranscript name position ending position Z44808_PEA_1_T11 (SEQ ID NO:36) 1858 2734 Z44808_PEA_1_T4 (SEQ ID NO: 37) 3594 4470 Z44808_PEA_1_T5(SEQ ID NO: 38) 2004 2880

Segment cluster Z44808_PEA_(—)1_node_(—)36 (SEQ ID NO:1261) according tothe present invention is supported by 117 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z44808_PEA_(—)1_T11 (SEQ IDNO:36), Z44808_PEA_(—)1_T4 (SEQ ID NO:37) and Z44808 _PEA_(—)1 _T5 (SEQID NO:38). Table 353 below describes the starting and ending position ofthis segment on each transcript.

TABLE 353 Segment location on transcripts Segment starting SegmentTranscript name position ending position Z44808_PEA_1_T11 (SEQ ID NO:36) 2829 3080 Z44808_PEA_1_T4 (SEQ ID NO: 37) 4565 4816 Z44808_PEA_1_T5(SEQ ID NO: 38) 2975 3226

Segment cluster Z44808_PEA_(—)1_node_(—)37 (SEQ ID NO:1262) according tothe present invention is supported by 120 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z44808_PEA_(—)1_T11 (SEQ IDNO:36), Z44808_PEA_(—)1_T4 (SEQ ID NO:37) and Z44808_PEA_(—)1_T5 (SEQ IDNO:38). Table 354 below describes the starting and ending position ofthis segment on each transcript.

TABLE 354 Segment location on transcripts Segment starting SegmentTranscript name position ending position Z44808_PEA_1_T11 (SEQ ID NO:36) 3081 3429 Z44808_PEA_1_T4 (SEQ ID NO: 37) 4817 5165 Z44808_PEA_1_T5(SEQ ID NO: 38) 3227 3575

Segment cluster Z44808_PEA_(—)1_node_(—)41 (SEQ ID NO:1263) according tothe present invention is supported by 2 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z44808_PEA_(—)1_T9 (SEQ ID NO:40).Table 355 below describeds the starting and ending position of thissegment on each transcript.

TABLE 355 Segment location on transcripts Segment Segment endingTranscript name starting position position Z44808_PEA_1_T9 (SEQ ID NO:40) 1974 2206

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription. Segment cluster Z44808_PEA_(—)1_node_l 1 (SEQ ID NO:1264)according to the present invention is supported by 25 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s): Z44808_PEA_(—)1_T4 (SEQ IDNO:37), Z44808_PEA_(—)1 T5 (SEQ ID NO:38), Z44808_PEA_(—)1_T8 (SEQ IDNO:39) and Z44808_PEA_(—)1_T9 (SEQ ID NO:40). Table 365 below describesthe starting and ending position of this segment on each transcript.

TABLE 356 Segment location on transcripts Segment Segment endingTranscript name starting position position Z44808_PEA_1_T4 (SEQ ID NO:37) 1097 1147 Z44808_PEA_1_T5 (SEQ ID NO: 38) 1097 1147 Z44808_PEA_1_T8(SEQ ID NO: 39) 1097 1147 Z44808_PEA_1_T9 (SEQ ID NO: 40) 1097 1147

Segment cluster Z44808_PEA_(—)1_node_(—)13 (SEQ ID NO:1265) according tothe present invention is supported by 28 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z44808_PEA_(—)1_T11 (SEQ IDNO:36), Z44808_PEA_(—)1 T4 (SEQ ID NO:37) , Z44808PEA_(—)1_T5 (SEQ IDNO:38), Z44808_PEA_(—)1_T8 (SEQ ID NO:39) and Z44808_PEA_(—)1_T9 (SEQ IDNO:40). Table 357 below describes the starting and ending position ofthis segment on each transcript.

TABLE 357 Segment location on transcripts Segment starting SegmentTranscript name position ending position Z44808_PEA_1_T11 (SEQ ID NO:36) 1097 1171 Z44808_PEA_1_T4 (SEQ ID NO: 37) 1148 1222 Z44808_PEA_1_T5(SEQ ID NO: 38) 1148 1222 Z44808_PEA_1_T8 (SEQ ID NO: 39) 1148 1222Z44808_PEA_1_T9 (SEQ ID NO: 40) 1148 1222

Segment cluster Z44808_PEA_(—)1_node_(—)18 (SEQ ID NO:1266) according tothe present invention is supported by 27 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z44808_PEA_(—)1_T11 (SEQ IDNO:36), Z44808_PEA_(—)1 T4 (SEQ ID NO:37) , Z44808_PEA_(—)1_T5 (SEQ IDNO:38), Z44808_PEA_(—)1_T8 (SEQ ID NO:39) and Z44808_PEA_(—)1_T9 (SEQ IDNO:40). Table 358 below describes the starting and ending position ofthis segment on each transcript.

TABLE 358 Segment location on transcripts Segment starting SegmentTranscript name position ending position Z44808_PEA_1_T11 (SEQ ID NO:36) 1359 1441 Z44808_PEA_1_T4 (SEQ ID NO: 37) 1410 1492 Z44808_PEA_1_T5(SEQ ID NO: 38) 1410 1492 Z44808_PEA_1_T8 (SEQ ID NO: 39) 1410 1492Z44808_PEA_1_T9 (SEQ ID NO: 40) 1410 1492

Segment cluster Z44808_PEA_(—)1_node_(—)22 (SEQ ID NO:1267) according tothe present invention is supported by 33 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z44808_PEA_(—)1_T11 (SEQ IDNO:36), Z44808_PEA_(—)1_T4 (SEQ ID NO:37) , Z44808_PEA_(—)1_T5 (SEQ IDNO:38), Z44808_PEA_(—)1_T8 (SEQ ID NO:39) and Z44808_PEA_(—)1_T9 (SEQ IDNO:40). Table 359 below describes the starting and ending position ofthis segment on each transcript.

TABLE 359 Segment location on transcripts Segment starting SegmentTranscript name position ending position Z44808_PEA_1_T11 (SEQ ID NO:36) 1442 1544 Z44808_PEA_1_T4 (SEQ ID NO: 37) 1493 1595 Z44808_PEA_1_T5(SEQ ID NO: 38) 1493 1595 Z44808_PEA_1_T8 (SEQ ID NO: 39) 1493 1595Z44808_PEA_1_T9 (SEQ ID NO: 40) 1493 1595

Microarray (chip) data is also available for this segment as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotides were found to hit this segment (with regard to lungcancer), shown in Table 360.

TABLE 360 Oligonucleotides related to this segment Oligonucleotide nameOverexpressed in cancers Chip reference Z44808_0_8_0 Lung squamous cellLUN (SEQ ID NO: 218) carcinoma

Segment cluster Z44808_PEA_(—)1_node_(—)26 (SEQ ID NO:1268) according tothe present invention is supported by 2 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z44808_PEA_(—)1_T5 (SEQ ID NO:38).Table 361 below describes the starting and ending position of thissegment on each transcript.

TABLE 361 Segment location on transcripts Segment Segment endingTranscript name starting position position Z44808_PEA_1_T5 (SEQ ID NO:38) 1871 1965

Microarray (chip) data is also available for this segment as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotides were found to hit this segment (with regard to lungcancer), shown in Table 362.

TABLE 362 Oligonucleotides related to this segment Oligonucleotide nameOverexpressed in cancers Chip reference Z44808_0_0_72347 Lung small cellcancer LUN (SEQ ID NO: 219)

Segment cluster Z44808_PEA_(—)1_node_(—)30 (SEQ ID NO:1269) according tothe present invention is supported by 44 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z44808_PEA_(—)1_T11 (SEQ IDNO:36), Z44808_PEA_(—)1_T4 (SEQ ID NO:37) , Z44808_PEA_(—)1_T5 (SEQ IDNO:38), Z44808_PEA_(—)1_T8 (SEQ ID NO:39) and Z44808_PEA_(—)1_T9 (SEQ IDNO:40). Table 363 below describes the starting and ending position ofthis segment on each transcript.

TABLE 363 Segment location on transcripts Segment starting SegmentTranscript name position ending position Z44808_PEA_1_T11 (SEQ ID NO:36) 1820 1857 Z44808_PEA_1_T4 (SEQ ID NO: 37) 1871 1908 Z44808_PEA_1_T5(SEQ ID NO: 38) 1966 2003 Z44808_PEA_1_T8 (SEQ ID NO: 39) 1871 1908Z44808_PEA_1_T9 (SEQ ID NO: 40) 1871 1908

Segment cluster Z44808_PEA_(—)1_node_(—)34 (SEQ ID NO:1270) according tothe present invention is supported by 70 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z44808_PEA_(—)1_T11 (SEQ IDNO:36), Z44808_PEA_(—)1_T4 (SEQ ID NO:37) and Z44808_PEA_(—)1_T5 (SEQ IDNO:38). Table 364 below describes the starting and ending position ofthis segment on each transcript.

TABLE 364 Segment location on transcripts Segment starting SegmentTranscript name position ending position Z44808_PEA_1_T11 (SEQ ID NO:36) 2735 2809 Z44808_PEA_1_T4 (SEQ ID NO: 37) 4471 4545 Z44808_PEA_1_T5(SEQ ID NO: 38) 2881 2955

Segment cluster Z44808_PEA_(—)1_node_(—)35 (SEQ ID NO:1271) according tothe present invention can be found in the following transcript(s):Z44808PEA_(—)1_T11 (SEQ ID NO:36), Z44808_PEA_(—)1_T4 (SEQ ID NO:37) andZ44808_PEA_(—)1_T5 (SEQ ID NO:38). Table 365 below describes thestarting and ending position of this segment on each transcript.

TABLE 365 Segment location on transcripts Segment starting SegmentTranscript name position ending position Z44808_PEA_1_T11 (SEQ ID NO:36) 2810 2828 Z44808_PEA_1_T4 (SEQ ID NO: 37) 4546 4564 Z44808_PEA_1_T5(SEQ ID NO: 38) 2956 2974

Segment cluster Z44808_PEA_(—)1_node_(—)39 (SEQ ID NO:1272) according tothe present invention is supported by 1 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z44808_PEA_(—)1_T9 (SEQ ID NO:40).Table 366 below describes the starting and ending position of thissegment on each transcript.

TABLE 366 Segment location on transcripts Segment Segment endingTranscript name starting position position Z44808_PEA_1_T9 (SEQ ID NO:40) 1909 1973

Segment cluster Z44808_PEA_(—)1_node_(—)4 (SEQ ID NO:1273) according tothe present invention is supported by 33 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z44808_PEA_(—)1_T11 (SEQ IDNO:36), Z44808_PEA_(—)1_T4 (SEQ ID NO:37) , Z44808PEA_(—)1_T5 (SEQ IDNO:38), Z44808_PEA_(—)1_T8 (SEQ ID NO:39) and Z44808_PEA_(—)1_T9 (SEQ IDNO:40). Table 367 below describes the starting and ending position ofthis segment on each transcript.

TABLE 367 Segment location on transcripts Segment starting SegmentTranscript name position ending position Z44808_PEA_1_T11 (SEQ ID NO:36) 842 948 Z44808_PEA_1_T4 (SEQ ID NO: 37) 842 948 Z44808_PEA_1_T5 (SEQID NO: 38) 842 948 Z44808_PEA_1_T8 (SEQ ID NO: 39) 842 948Z44808_PEA_1_T9 (SEQ ID NO: 40) 842 948

Segment cluster Z44808_PEA_(—)1_node_(—)6 (SEQ ID NO:1274) according tothe present invention is supported by 30 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z44808_PEA_(—)1_T11 (SEQ IDNO:36), Z44808_PEA_(—)1_T4 (SEQ ID NO:37) , Z44808_PEA_(—)1_T5 (SEQ IDNO:38), Z44808_PEA_(—)1_T8 (SEQ ID NO:39) and Z44808_PEA_(—)1_T9 (SEQ IDNO:40). Table 368 below describes the starting and ending position ofthis segment on each transcript.

TABLE 368 Segment location on transcripts Segment starting SegmentTranscript name position ending position Z44808_PEA_1_T11 (SEQ ID NO:36) 949 1048 Z44808_PEA_1_T4 (SEQ ID NO: 37) 949 1048 Z44808_PEA_1_T5(SEQ ID NO: 38) 949 1048 Z44808_PEA_1_T8 (SEQ ID NO: 39) 949 1048Z44808_PEA_1_T9 (SEQ ID NO: 40) 949 1048

Segment cluster Z44808_PEA_(—)1_node_(—)8 (SEQ ID NO:1275) according tothe present invention is supported by 25 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z44808_PEA_(—)1_T11 (SEQ IDNO:36), Z44808_PEA_(—)1_T4 (SEQ ID NO:37) , Z44808_PEA_(—)1_T5 (SEQ IDNO:38), Z44808_PEA_(—)1_T8 (SEQ ID NO:39) and Z44808_PEA_(—)1_T9 (SEQ IDNO:40). Table 369 below describes the starting and ending position ofthis segment on each transcript.

TABLE 369 Segment location on transcripts Segment starting SegmentTranscript name position ending position Z44808_PEA_1_T11 (SEQ ID NO:36) 1049 1096 Z44808_PEA_1_T4 (SEQ ID NO: 37) 1049 1096 Z44808_PEA_1_T5(SEQ ID NO: 38) 1049 1096 Z44808_PEA_1_T8 (SEQ ID NO: 39) 1049 1096Z44808_PEA_1_T9 (SEQ ID NO: 40) 1049 1096

Variant protein alignment to the previously known protein:

Sequence name: /tmp/vUqLu6eAVZ/K3JDuPvaLo:SMO2_HUMAN (SEQ ID NO: 1430)Sequence documentation: Alignment of: Z44808_PEA_1_P5 (SEQ ID NO: 1314)× SMO2_HUMAN (SEQ ID NO: 1430) Alignment segment 1/1: Quality: 4440.00Escore: 0 Matching length: 441 Total length: 441 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:

Sequence name: /tmp/QSUNfTsJ5y/kLOw5Vb6SD:SMO2_HUMAN (SEQ ID NO: 1430)Sequence documentation: Alignment of: Z44808_PEA_1_P6 (SEQ ID NO: 1315)× SMO2_HUMAN (SEQ ID NO: 1430) Alignment segment 1/1: Quality: 4310.00Escore: 0 Matching length: 428 Total length: 428 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:

Sequence name: /tmp/MZVdR4PVdM/5uN8RwViJ1:SMO2_HUMAN (SEQ ID NO: 1430)Sequence documentation: Alignment of: Z44808_PEA_1_P7 (SEQ ID NO: 1316)× SMO2_HUMAN (SEQ ID NO: 1430) Alignment segment 1/1: Quality: 4440.00Escore: 0 Matching length: 441 Total length: 441 Matching PercentSimilarity: 100.00 Matching Percent Identity: 100.00 Total PercentSimilarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:

Sequence name: /tmp/3fGVxqLloe/J5mQduAd0F:SMO2_HUMAN (SEQ ID NO: 1430)Sequence documentation: Alignment of: Z44808_PEA_1_P11 (SEQ ID NO: 1317)× SMO2_HUMAN (SEQ ID NO: 1430) . . . Alignment segment 1/1: Quality:4228.00 Escore: 0 Matching length: 429 Total length: 446 MatchingPercent Similarity: 100.00 Matching Percent Identity: 100.00 TotalPercent Similarity: 96.19 Total Percent Identity: 96.19 Gaps: 1Alignment:

Expression of SMO2_HUMAN SPARC related modular calcium-binding protein 2precursor Z44808 transcripts which are detectable by amplicon asdepicted in sequence name Z44808junc8-11 (SEQ ID NO: 1651) in normal andcancerous lung tissues

Expression of SMO2_HUMAN SPARC related modular calcium-binding protein 2precursor (Secreted modular calcium-binding protein 2) (SMOC-2) (Smoothmuscle-associated protein 2) transcripts detectable by or according tojunc8-11, Z44808 junc8-11 amplicon (SEQ ID NO: 1651) and Z44808junc8-11F(SEQ ID NO: 1649) and Z44808junc8-11R (SEQ ID NO: 1650) primers wasmeasured by real time PCR. In parallel the expression of fourhousekeeping genes—PBGD (GenBank Accession No. BC019323 (SEQ IDNO:1713); amplicon—PBGD-amplicon, SEQ ID NO:334), HPRT1 (GenBankAccession No. NM_(—)000194 (SEQ ID NO:1714); amplicon—HPRT1-amplicon,SEQ ID NO:1297), Ubiquitin (GenBank Accession No. BC000449 (SEQ IDNO:1711); amplicon—Ubiquitin-amplicon, SEQ ID NO:328) and SDHA (GenBankAccession No. NM_(—)004168 (SEQ ID NO:1712); amplicon—SDHA-amplicon, SEQID NO:331) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of thequantities of the housekeeping genes. The normalized quantity of each RTsample was then divided by the median of the quantities of the normalpost-mortem (PM) samples (Sample Nos. 47-50, 90-93, 96-99, Table 2,“Tissue samples in testing panel”, above), to obtain a value of foldup-regulation for each sample relative to median of the normal PMsamples.

FIG. 29 is a histogram showing over expression of the above-indicatedSMO2_HUMAN SPARC related modular calcium-binding protein 2 precursortranscripts in cancerous lung samples relative to the normal samples.

As is evident from FIG. 29, the expression of SMO2_HUMAN SPARC relatedmodular calcium-binding protein 2 precursor transcripts detectable bythe above amplicon in several cancer samples was significantly higherthan in the non-cancerous samples (Sample Nos. 47-50, 90-93, 96-99 Table2, “Tissue samples in testing panel”).

Notably an over-expression of at least 5 fold was found in 2 out of 15adenocarcinoma samples and in 3 out of 8 small cells carcinoma samples.

Primer pairs are also optionally and preferably encompassed within thepresent invention; for example, for the above experiment, the followingprimer pair was used as a non-limiting illustrative example only of asuitable primer pair: Z44808junc8-11F forward primer (SEQ ID NO: 1649);and Z44808junc8-11 R reverse primer (SEQ ID NO: 1650).

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: Z44808junc8-11(SEQ IDNO: 1651)

Forward primer (SEQ ID NO: 1649): GAAGGCACAGGAAAAACAGATATTG Reverseprimer (SEQ ID NO: 1650): TGGTGCTCTTGGTCACAGGAT Amplicon (SEQ ID NO:1651): GAAGGCACAGGAAAAACAGATATTGCATCACGTTACCCTACCCTTTGGACTGAACAGGTTAAAAGTCGGCAGAACAAAACCAATAAGAATTCAGTGTCAT CCTGTGACCAAGAGCACCA

Expression of SMO2_HUMAN SPARC related modular calcium-binding protein 2precursor (Secreted modular calcium-binding protein 2) (SMOC-2) (Smoothmuscle-associated protein 2) Z44808 transcripts which are detectable byamplicon as depicted in sequence name Z44808 junc8-11(SEQ ID NO: 1651)in different normal tissues

Expression of SMO2_HUMAN SPARC related modular calcium-binding protein 2precursor (Secreted modular calcium-binding protein 2) (SMOC-2) (Smoothmuscle-associated protein 2) transcripts detectable by or according toZ44808 junc8-11 amplicon (SEQ ID NO: 1651) and primers: Z44808junc8-11F(SEQ ID NO: 1649) and Z44808 junc8-11R (SEQ ID NO: 1650) wasmeasured by real time PCR. In parallel the expression of fourhousekeeping genes—RPL19 (GenBank Accession No. NM_(—)000981 (SEQ IDNO:1715); RPL19 amplicon, SEQ ID NO:1630), TATA box (GenBank AccessionNo. NM_(—)003194 (SEQ ID NO:1716); TATA amplicon, SEQ ID NO:1633),Ubiquitin (GenBank Accession No. BC000449 (SEQ ID NO:1711);amplicon—Ubiquitin-amplicon, SEQ ID NO:328) and SDHA (GenBank AccessionNo. NM_(—)004168 (SEQ ID NO:1712); amplicon—SDHA-amplicon, SEQ IDNO:331) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of thequantities of the housekeeping genes. The normalized quantity of each RTsample was then divided by the median of the quantities of the ovarysamples (Sample Nos. 18-20, Table 3), to obtain a value of relativeexpression of each sample relative to median of the ovary samples.

Primers: Forward primer (SEQ ID NO: 1649): GAAGGCACAGGAAAAACAGATATTGReverse primer (SEQ ID NO: 1650): TGGTGCTCTTGGTCACAGGAT Amplicon (SEQ IDNO: 1651): GAAGGCACAGGAAAAACAGATATTGCATCACGTTACCCTACCCTTTGGACTGAACAGGTTAAAAGTCGGCAGAACAAAACCAATAAGAATTCAGTGTCAT CCTGTGACCAAGAGCACCA

The results are demonstrated in FIG. 18, showing the expression ofSMO2_HUMAN SPARC related modular calcium-binding protein 2 precursor(Secreted modular calcium-binding protein 2) (SMOC-2) (Smoothmuscle-associated protein 2) Z44808 transcripts which are detectable byamplicon as depicted in sequence name Z44808 junc8-11 (SEQ ID NO: 1651)in different normal tissues.

Description for Cluster AA161187

Cluster AA161187 features 7 transcript(s) and 20 segment(s) of interest,the names for which are given in Tables 370 and 371, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 372.

TABLE 370 Transcripts of interest Transcript Name Sequence ID No.AA161187_T0 41 AA161187_T7 42 AA161187_T15 43 AA161187_T16 44AA161187_T20 45 AA161187_T21 46 AA161187_T22 47

TABLE 371 Segments of interest Segment Name Sequence ID No.AA161187_node_0 482 AA161187_node_6 483 AA161187_node_14 484AA161187_node_16 485 AA161187_node_25 486 AA161187_node_26 487AA161187_node_28 488 AA161187_node_4 489 AA161187_node_7 490AA161187_node_8 491 AA161187_node_9 492 AA161187_node_10 493AA161187_node_12 494 AA161187_node_13 495 AA161187_node_19 496AA161187_node_20 497 AA161187_node_21 498 AA161187_node_22 499AA161187_node_23 500 AA161187_node_24 501

TABLE 372 Proteins of interest Protein Name Sequence ID No.Corresponding Transcript(s) AA161187_P1 1318 AA161187_T0 (SEQ ID NO: 41)AA161187_P6 1319 AA161187_T7 (SEQ ID NO: 42) AA161187_P13 1320AA161187_T15 (SEQ ID NO: 43) AA161187_P14 1321 AA161187_T16 (SEQ ID NO:44) AA161187_P18 1322 AA161187_T20 (SEQ ID NO: 45) AA161187_P19 1323AA161187_T21 (SEQ ID NO: 46)

These sequences are variants of the known protein Testisin precursor(SwissProt accession identifier TEST_HUMAN; known also according to thesynonyms EC 3.4.21.-; Eosinophil serine protease 1; ESP-1;UNQ266/PRO303), SEQ ID NO: 1431, referred to herein as the previouslyknown protein.

Protein Testisin precursor (SEQ ID NO:1431) is known or believed to havethe following function(s): Could regulate proteolytic events associatedwith testicular germ cell maturation. The sequence for protein Testisinprecursor is given at the end of the application, as “Testisin precursoramino acid sequence”. Protein Testisin precursor localization isbelieved to be attached to the membrane by a GPI-anchor.

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: serine-type peptidase, which areannotation(s) related to Molecular Function; and membrane fraction;cytoplasm; plasma membrane, which are annotation(s) related to CellularComponent.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

Cluster AA 161187 can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the left hand columnof the table and the numbers on the y-axis of FIG. 30 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million). Overall, thefollowing results were obtained as shown with regard to the histogramsin FIG. 30 and Table 373. This cluster is overexpressed (at least at aminimum level) in the following pathological conditions: brain malignanttumors, epithelial malignant tumors and a mixture of malignant tumorsfrom different tissues.

TABLE 373 Normal tissue distribution Name of Tissue Number bone 0 brain1 colon 0 epithelial 0 general 0 lung 0 breast 0 bone marrow 0 ovary 0pancreas 0 prostate 4 stomach 0 uterus 0

TABLE 374 P values and ratios for expression in cancerous tissue Name ofTissue P1 P2 SP1 R3 SP2 R4 bone 1 6.7e−01 1 1.0 3.4e−01 1.9 brain9.8e−01 6.0e−01 1 0.7 3.8e−03 3.6 colon 4.4e−01 5.0e−01 7.0e−01 1.57.7e−01 1.3 epithelial 1.3e−02 2.6e−03 1.7e−03 8.4 2.4e−04 7.9 general1.6e−03 1.9e−05 1.9e−05 12.1 2.9e−10 15.6 lung 5.0e−01 6.3e−01 1.7e−013.9 3.8e−01 2.2 breast 1 6.7e−01 1 1.0 8.2e−01 1.2 bone marrow 1 4.2e−011 1.0 1.5e−01 2.9 ovary 6.2e−01 6.5e−01 4.7e−01 1.9 5.9e−01 1.6 pancreas1 4.4e−01 1 1.0 2.8e−01 2.8 prostate 5.9e−01 5.9e−01 1.4e−01 2.9 2.4e−012.3 stomach 1 4.7e−01 1 1.0 6.4e−01 1.5 uterus 1 2.4e−01 1 1.0 1.7e−012.0

As noted above, cluster AA161187 features transcript(s), which werelisted in Table 370 above. These transcript(s) encode for protein(s)which are variant(s) of protein Testisin precursor (SEQ ID NO:1431). Adescription of each variant protein according to the present inventionis now provided.

Variant protein AA161187_P1 (SEQ ID NO:1318) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) AA161187_T0 (SEQ ID NO:41).The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide.

Variant protein AA161187_P1 (SEQ ID NO:1318) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table375, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein AA161187_P1 (SEQ ID NO:1318) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 375 Amino acid mutations SNP position(s) on amino acid Alternativesequence amino acid(s) Previously known SNP? 1 M -> No 16 A -> No 226 N-> No 253 I -> V No 255 V -> I No 264 R -> No 264 R -> P No 264 R -> QYes

Variant protein AA161187_P1 (SEQ ID NO:1318) is encoded by the followingtranscript(s): AA161187_T0 (SEQ ID NO:41), for which the sequence(s)is/are given at the end of the application. The coding portion oftranscript AA161187_T0 (SEQ ID NO:41) is shown in bold; this codingportion starts at position 107 and ends at position 1048. The transcriptalso has the following SNPs as listed in Table 376 (given according totheir position on the nucleotide sequence, with the alternative nucleicacid listed; the last column indicates whether the SNP is known or not;the presence of known SNPs in variant protein AA161187_P1 (SEQ IDNO:1318) sequence provides support for the deduced sequence of thisvariant protein according to the present invention).

TABLE 376 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 66 T -> A No 67 T -> G No105 C -> T No 108 T -> No 154 T -> No 190 C -> G No 469 A -> G Yes 571 C-> T Yes 782 A -> No 859 T -> C Yes 863 A -> G No 869 G -> A No 897 G ->No 897 G -> A Yes 897 G -> C No 1000 A -> G Yes 1068 G -> No 1068 G -> ANo 1069 C -> A No 1168 A -> G Yes

Variant protein AA161187_P6 (SEQ ID NO:1319) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) AA161187_T7 (SEQ ID NO:42).An alignment is given to the known protein (Testisin precursor (SEQ IDNO:1431)) at the end of the application. One or more alignments to oneor more previously published protein sequences are given at the end ofthe application. A brief description of the relationship of the variantprotein according to the present invention to each such aligned proteinis as follows:

Comparison report between AA161187_P6 (SEQ ID NO:1319) and TEST_HUMAN(SEQ ID NO:1431):

1. An isolated chimeric polypeptide encoding for AA161187_P6 (SEQ IDNO:1319), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence HTREGTLGGQKRAFPDGVEGEKGRGRAWGAASRGSAVPLTIR (SEQ IDNO: 273) corresponding to amino acids 1-42 of AA161187_P6 (SEQ IDNO:1319), and a second amino acid sequence being at least 90% homologousto GPCGRRVITSRIVGGEDAELGRWPWQGSLRLWDSHVCGVSLLSHRWALTAAHCFETYSDLSDPSGWMVQFGQLTSMPSFWSLQAYYTRYFVSNIYLSPRYLGNSPYDIALVKLSAPVTYTKHIQPICLQASTFEFENRTDCWVTGWGYIKEDEALPSPHTLQEVQVAIINNSMCNHLFLKYSFRKDIFGDMVCAGNAQGGKDACFGDSGGPLACNKNGLWYQIGVVSWGVGCGRPNRPGVYTNISHHFEWIQKLMAQSGMSQPDPSWPLLFFPLLWALPLLGPV corresponding to amino acids 31-314 of TEST_HUMAN (SEQ IDNO:1431), which also corresponds to amino acids 43-326 of AA161187_P6(SEQ ID NO:1319), wherein said first amino acid sequence and secondamino acid sequence are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a head of AA161187_P6 (SEQ IDNO:1319), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence HTREGTLGGQKRAFPDGVEGEKGRGRAWGAASRGSAVPLTIR (SEQ ID NO: 273) ofAA161187_P6 (SEQ ID NO:1319).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:membrane. The protein localization is believed to be membrane becausealthough it is a partial protein, because both trans-membrane regionprediction programs predict that this protein has a trans-membraneregion.

Variant protein AA161187_P6 (SEQ ID NO:1319) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table377, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein AA161187_P6 (SEQ ID NO:1319) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 377 Amino acid mutations SNP position(s) on amino acid Alternativesequence amino acid(s) Previously known SNP? 238 N -> No 265 I -> V No267 V -> I No 276 R -> No 276 R -> P No 276 R -> Q Yes

The glycosylation sites of variant protein AA161187_P6 (SEQ ID NO:1319),as compared to the known protein Testisin precursor (SEQ ID NO:1431),are described in Table 378 (given according to their position(s) on theamino acid sequence in the first column; the second column indicateswhether the glycosylation site is present in the variant protein; andthe last column indicates whether the position is different on thevariant protein).

TABLE 378 Glycosylation site(s) Position(s) on known amino Present inacid sequence variant protein? Position in variant protein? 200 yes 212167 yes 179 273 yes 285

Variant protein AA161187_P6 (SEQ ID NO:1319) is encoded by the followingtranscript(s): AA161187_T7 (SEQ ID NO:42), for which the sequence(s)is/are given at the end of the application. The coding portion oftranscript AA161187_T7 (SEQ ID NO:42) is shown in bold; this codingportion starts at position 1 and ends at position 979. The transcriptalso has the following SNPs as listed in Table 379 (given according totheir position on the nucleotide sequence, with the alternative nucleicacid listed; the last column indicates whether the SNP is known or not;the presence of known SNPs in variant protein AA161187_P6 (SEQ IDNO:1319) sequence provides support for the deduced sequence of thisvariant protein according to the present invention).

TABLE 379 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 400 A -> G Yes 502 C -> TYes 713 A -> No 790 T -> C Yes 794 A -> G No 800 G -> A No 828 G -> No828 G -> A Yes 828 G -> C No 931 A -> G Yes 999 G -> No 999 G -> A No1000 C -> A No 1099 A -> G Yes

Variant protein AA161187_P13 (SEQ ID NO:1320) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) AA161187_T15 (SEQ ID NO:43).An alignment is given to the known protein (Testisin precursor (SEQ IDNO:1431)) at the end of the application. One or more alignments to oneor more previously published protein sequences are given at the end ofthe application. A brief description of the relationship of the variantprotein according to the present invention to each such aligned proteinis as follows:

Comparison report between AA161187_P13 (SEQ ID NO:1320) and TEST_HUMAN(SEQ ID NO:1431):

1. An isolated chimeric polypeptide encoding for AA161187_P13 (SEQ IDNO:1320), comprising a first amino acid sequence being at least 90%homologous toMGARGALLLALLLARAGLRKPESQEAAPLSGPCGRRVITSRIVGGEDAELGRWPWQGSLRLWDSHVCGVSLLSHRWALTAAHCFETYSDLSDPSGWMVQFGQLTSMPSFWSLQAYYTRYFVSNIYLSPRYLGNSPYDIALVKLSAPVTYTKHIQPICLQASTFEFENRTDCWVTGWGYIKEDE corresponding to aminoacids 1-183 of TEST_HUMAN (SEQ ID NO:1431), which also corresponds toamino acids 1-183 of AA161187_P13 (SEQ ID NO:1320), and a second aminoacid sequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequenceGSSGRHHKQLYVQPPLPQVQFPQGHLWRHG (SEQ ID NO: 274) corresponding to aminoacids 184-213 of AA161187_P13 (SEQ ID NO:1320), wherein said first aminoacid sequence and second amino acid sequence are contiguous and in asequential order.

2. An isolated polypeptide encoding for a tail of AA161187_P13 (SEQ IDNO:1320), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence GSSGRHHKQLYVQPPLPQVQFPQGHLWRHG (SEQ ID NO: 274) in AA161187_P13(SEQ ID NO:1320).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region. Variant proteinAA161187_P13 (SEQ ID NO:1320) also has the following non-silent SNPs(Single Nucleotide Polymorphisms) as listed in Table 380, (givenaccording to their position(s) on the amino acid sequence, with thealternative amino acid(s) listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinAA161187_P13 (SEQ ID NO:1320) sequence provides support for the deducedsequence of this variant protein according to the present invention).

TABLE 380 Amino acid mutations SNP position(s) on amino acid Alternativesequence amino acid(s) Previously known SNP? 1 M -> No 16 A -> No

The glycosylation sites of variant protein AA161187_P13 (SEQ IDNO:1320), as compared to the known protein Testisin precursor (SEQ IDNO:1431), are described in Table 381 (given according to theirposition(s) on the amino acid sequence in the first column; the secondcolumn indicates whether the glycosylation site is present in thevariant protein; and the last column indicates whether the position isdifferent on the variant protein).

TABLE 381 Glycosylation site(s) Position(s) on known amino Present inacid sequence variant protein? Position in variant protein? 200 no 167yes 167 273 no

Variant protein AA161187_P13 (SEQ ID NO:1320) is encoded by thefollowing transcript(s): AA161187_T15 (SEQ ID NO:43), for which thesequence(s) is/are given at the end of the application. The codingportion of transcript AA161187_T15 (SEQ ID NO:43) is shown in bold; thiscoding portion starts at position 107 and ends at position 745. Thetranscript also has the following SNPs as listed in Table 382 (givenaccording to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinAA161187P13 (SEQ ID NO:1320) sequence provides support for the deducedsequence of this variant protein according to the present invention).

TABLE 382 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 66 T -> A No 67 T -> G No105 C -> T No 108 T -> No 154 T -> No 190 C -> G No 469 A -> G Yes 571 C-> T Yes 791 T -> C Yes 795 A -> G No 801 G -> A No 829 G -> No 829 G ->A Yes 829 G -> C No 932 A -> G Yes 1000 G -> No 1000 G -> A No 1001 C ->A No 1100 A -> G Yes

Variant protein AA161187_P14 (SEQ ID NO:1321) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) AA161187_T16 (SEQ ID NO:44).An alignment is given to the known protein (Testisin precursor (SEQ IDNO:1431)) at the end of the application. One or more alignments to oneor more previously published protein sequences are given at the end ofthe application. A brief description of the relationship of the variantprotein according to the present invention to each such aligned proteinis as follows:

Comparison report between AA161187_P14 (SEQ ID NO:1321) and TEST_HUMAN(SEQ ID NO:1431):

1. An isolated chimeric polypeptide encoding for AA161187_P14 (SEQ IDNO:1321), comprising a first amino acid sequence being at least 90%homologous toMGARGALLLALLLARAGLRKPESQEAAPLSGPCGRRVITSRIVGGEDAELGRWPWQGSLRLWDSHVCGVSLLSHRWALTAAHCFETYSDLSDPSGWMVQFGQLTSMPSFWSLQAYYTRYFVSNIYLSPRYLGNSPYDIALVKLSAPVTYTKHIQPICLQASTFEFENRTDCWVTGWGYIKEDE corresponding to aminoacids 1-183 of

TEST_HUMAN (SEQ ID NO:1431), which also corresponds to amino acids 1-183of AA161187_P14 (SEQ ID NO:1321), and a second amino acid sequence beingat least 70%, optionally at least 80%, preferably at least 85%, morepreferably at least 90% and most preferably at least 95% homologous to apolypeptide having the sequenceGCCLSPSHYRPHSTAISPHPPGSSGRHHKQLYVQPPLPQVQFPQGHLWRHGLCWQCPRREGCLLRECPCHHSQPRKASCVPVPYLTLMPTPGGGDCCPTLQMQKRRLGCCQGEEEDVHPVYPAP (SEQ ID NO: 275)corresponding to amino acids 184-307 of AA161187_P14 (SEQ ID NO:1321),wherein said first amino acid sequence and second amino acid sequenceare contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of AA161187_P14 (SEQ IDNO:1321), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequenceGCCLSPSHYRPHSTAISPHPPGSSGRHHKQLYVQPPLPQVQFPQGHLWRHGLCWQCPRREGCLLRECPCHHSQPRKASCVPVPYLTLJMPTPGGGDCCPTLQMQKRRLGCCQGEEEDVHPVYPAP (SEQ ID NO: 275)in AA161187_P14 (SEQ ID NO:1321).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein AA161187_P14 (SEQ ID NO:1321) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table383, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein AA161187_P14 (SEQ ID NO:1321) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 383 Amino acid mutations SNP position(s) on amino acid Alternativesequence amino acid(s) Previously known SNP? 1 M -> No 16 A -> No 238 Q-> No

The glycosylation sites of variant protein AA161187_P14 (SEQ IDNO:1321), as compared to the known protein Testisin precursor (SEQ IDNO:1431), are described in Table 384 (given according to theirposition(s) on the amino acid sequence in the first column; the secondcolumn indicates whether the glycosylation site is present in thevariant protein; and the last column indicates whether the position isdifferent on the variant protein).

TABLE 384 Glycosylation site(s) Position(s) on known amino Present inacid sequence variant protein? Position in variant protein? 200 no 167yes 167 273 no

Variant protein AA161187_P14 (SEQ ID NO:1321) is encoded by thefollowing transcript(s): AA161187_T16 (SEQ ID NO:44), for which thesequence(s) is/are given at the end of the application. The codingportion of transcript AA161187_T16 (SEQ ID NO:44) is shown in bold; thiscoding portion starts at position 107 and ends at position 1027. Thetranscript also has the following SNPs as listed in Table 385 (givenaccording to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinAA161187_P14 (SEQ ID NO:1321) sequence provides support for the deducedsequence of this variant protein according to the present invention).

TABLE 385 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 66 T -> A No 67 T -> G No105 C -> T No 108 T -> No 154 T -> No 190 C -> G No 469 A -> G Yes 571 C-> T Yes 819 A -> No 859 C -> T Yes 1152 T -> C Yes 1156 A -> G No 1162G -> A No 1190 G -> No 1190 G -> A Yes 1190 G -> C No 1293 A -> G Yes1361 G -> No 1361 G -> A No 1362 C -> A No 1461 A -> G Yes

Variant protein AA161187_P18 (SEQ ID NO:1322) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) AA161187_T20 (SEQ ID NO:45).An alignment is given to the known protein (Testisin precursor (SEQ IDNO:1431)) at the end of the application. One or more alignments to oneor more previously published protein sequences are given at the end ofthe application. A brief description of the relationship of the variantprotein according to the present invention to each such aligned proteinis as follows:

Comparison report between AA161187_P18 (SEQ ID NO:1322) and TEST_HUMAN(SEQ ID NO:1431):

1. An isolated chimeric polypeptide encoding for AA161187_P18 (SEQ IDNO:1322), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence HTREGTLGGQKRAFPDGVEGEKGRGRAWGAASRGSAVPLTIR (SEQ IDNO: 273) corresponding to amino acids 1-42 of AA161187_P18 (SEQ IDNO:1322), a second amino acid sequence being at least 90% homologous toGPCGRRVITSRIVGGEDAELGRWPWQGSLRLWDSHVCGVSLLSHRWALTAAHCFET correspondingto amino acids 31-86 of TEST_HUMAN (SEQ ID NO:1431), which alsocorresponds to amino acids 43-98 of AA161187_P18 (SEQ ID NO:1322), athird amino acid sequence being at least 90% homologous toDLSDPSGWMVQFGQLTSMPSFWSLQAYYTRYFVSNIYLSPRYLGNSPYDIALVKLSAPVTYTKHIQPICLQASTFEFENRTDCWVTGWGYIKEDEALPSPHTLQEVQVAIINNSMCNHLFLKYSFRKDIFGDMVCAGNAQGGKDACF corresponding to amino acids 89-235 of TEST_HUMAN (SEQ IDNO:1431), which also corresponds to amino acids 99-245 of AA161187_P18(SEQ ID NO:1322), and a fourth amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence VSVPATTPSPGKHPVSLCLI (SEQ ID NO: 277) correspondingto amino acids 246-265 of AA161187_P18 (SEQ ID NO:1322), wherein saidfirst amino acid sequence, second amino acid sequence, third amino acidsequence and fourth amino acid sequence are contiguous and in asequential order.

2. An isolated polypeptide encoding for a head of AA161187_P18 (SEQ IDNO:1322), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence HTREGTLGGQKRAFPDGVEGEKGRGRAWGAASRGSAVPLTIR (SEQ ID NO: 273) ofAA161187_P18 (SEQ ID NO:1322).

3. An isolated chimeric polypeptide encoding for an edge portion ofAA161187_P18 (SEQ ID NO:1322), comprising a polypeptide having a length“n”, wherein n is at least about 10 amino acids in length, optionally atleast about 20 amino acids in length, preferably at least about 30 aminoacids in length, more preferably at least about 40 amino acids in lengthand most preferably at least about 50 amino acids in length, wherein atleast two amino acids comprise TD, having a structure as follows: asequence starting from any of amino acid numbers 98−x to 98; and endingat any of amino acid numbers 99+((n−2)−x), in which x varies from 0 ton−2.

4. An isolated polypeptide encoding for a tail of AA161187_P18 (SEQ IDNO:1322), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence VSVPATTPSPGKHPVSLCLI (SEQ ID NO: 277) in AA161187_P18 (SEQ IDNO:1322).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:membrane. The protein localization is believed to be membrane becausealthough it is a partial protein, because both trans-membrane regionprediction programs predict that this protein has a trans-membraneregion.

Variant protein AA161187_P18 (SEQ ID NO:1322) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table386, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein AA161187_P18 (SEQ ID NO:1322) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 386 Amino acid mutations SNP position(s) on amino acid Alternativesequence amino acid(s) Previously known SNP? 236 N -> No 249 P -> L Yes

The glycosylation sites of variant protein AA161187_P18 (SEQ IDNO:1322), as compared to the known protein Testisin precursor (SEQ IDNO:1431), are described in Table 387 (given according to theirposition(s) on the amino acid sequence in the first column; the secondcolumn indicates whether the glycosylation site is present in thevariant protein; and the last column indicates whether the position isdifferent on the variant protein).

TABLE 387 Glycosylation site(s) Position(s) on known amino Present inacid sequence variant protein? Position in variant protein? 200 yes 210167 yes 177 273 no

Variant protein AA161187_P18 (SEQ ID NO:1322) is encoded by thefollowing transcript(s): AA 161187_T20 (SEQ ID NO:45), for which thesequence(s) is/are given at the end of the application. The codingportion of transcript AA161187_T20 (SEQ ID NO:45) is shown in bold; thiscoding portion starts at position 1 and ends at position 796. Thetranscript also has the following SNPs as listed in Table 388 (givenaccording to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinAA161187_P18 (SEQ ID NO:1322) sequence provides support for the deducedsequence of this variant protein according to the present invention).

TABLE 388 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 394 A -> G Yes 496 C -> TYes 707 A -> No 747 C -> T Yes 1040 T -> C Yes 1044 A -> G No 1050 G ->A No 1078 G -> No 1078 G -> A Yes 1078 G -> C No 1181 A -> G Yes 1249 G-> No 1249 G -> A No 1250 C -> A No 1349 A -> G Yes

Variant protein AA161187_P19 (SEQ ID NO:1323) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) AA161187_T21 (SEQ ID NO:46).An alignment is given to the known protein (Testisin precursor (SEQ IDNO:1431)) at the end of the application. One or more alignments to oneor more previously published protein sequences are given at the end ofthe application. A brief description of the relationship of the variantprotein according to the present invention to each such aligned proteinis as follows:

Comparison report between AA161187_P19 (SEQ ID NO:1323) and TEST_HUMAN(SEQ ID NO:1431):

1. An isolated chimeric polypeptide encoding for AA161187_P19 (SEQ IDNO:1323), comprising a first amino acid sequence being at least 90%homologous toMGARGALLLALLLARAGLRKPESQEAAPLSGPCGRRVITSRIVGGEDAELGRWPWQGSLRLWDSHVCGVSLLSHRWALTAAHCFETYSDLSDPSGWMVQFGQLTSMPSFWSLQAYYTRYFVSNIYLSPRYLGNSPYDIALVKLSAPVTYTKHIQPICLQASTFEFENRTDCWVTGWGYIKEDE corresponding to aminoacids 1-183 of TEST_HUMAN (SEQ ID NO:1431), which also corresponds toamino acids 1-183 of AA161187_P19 NO:1323), and a second amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence DKRTQ (SEQ ID NO: 278)corresponding to amino acids 184-188 of AA161187_P19 (SEQ ID NO:1323),wherein said first amino acid sequence and second amino acid sequenceare contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of AA161187_P19 (SEQ IDNO:1323), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence DKRTQ (SEQ ID NO: 278) in AA161187_P19 (SEQ ID NO:1323).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein AA161187_P19 (SEQ ID NO:1323) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table389, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein AA161187_P19 (SEQ ID NO:1323) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 389 Amino acid mutations SNP position(s) on amino acid Alternativesequence amino acid(s) Previously known SNP? 1 M -> No 16 A -> No

The glycosylation sites of variant protein AA161187_P19 (SEQ IDNO:1323), as Compared to the known protein Testisin precursor (SEQ IDNO:1431), are described in Table 390 (given according to theirposition(s) on the amino acid sequence in the first column; the secondcolumn indicates whether the glycosylation site is present in thevariant protein; and the last column indicates whether the position isdifferent on the variant protein).

TABLE 390 Glycosylation site(s) Position(s) on known amino Present inacid sequence variant protein? Position in variant protein? 200 no 167yes 167 273 no

Variant protein AA161187_P19 (SEQ lD NO:1323) is encoded by thefollowing transcript(s): AA161187_T21 (SEQ ID NO:46), for which thesequence(s) is/are given at the end of the application. The codingportion of transcript AA161187_T21 (SEQ ID NO:46) is shown in bold; thiscoding portion starts at position 107 and ends at position 670. Thetranscript also has the following SNPs as listed in Table 391 (givenaccording to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinAA161187_P19 (SEQ ID NO:1323) sequence provides support for the deducedsequence of this variant protein according to the present invention).

TABLE 391 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 66 T -> A No 67 T -> G No105 C -> T No 108 T -> No 154 T -> No 190 C -> G No 469 A -> G Yes 571 C-> T Yes 719 G -> T Yes

As noted above, cluster AA161187 features 20 segment(s), which werelisted in Table 371 above and for which the sequence(s) are given at theend of the application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster AA161187_node_O (SEQ ID NO:482) according to the presentinvention is supported by 21 libraries. The number of libraries wasdetermined as previously described. This segment can be found in thefollowing transcript(s): AA161187_T0 (SEQ ID NO:41), AA161187_T15 (SEQID NO:43), AA161187_T16 (SEQ ID NO:44), AA161187_T21 (SEQ ID NO:46) andAA161187_T22 (SEQ ID NO:47). Table 392 below describes the starting andending position of this segment on each transcript.

TABLE 392 Segment location on transcripts Segment Segment Transcriptname starting position ending position AA161187_T0 (SEQ ID NO: 41) 1 170AA161187_T15 (SEQ ID NO: 43) 1 170 AA161187_T16 (SEQ ID NO: 44) 1 170AA161187_T21 (SEQ ID NO: 46) 1 170 AA161187_T22 (SEQ ID NO: 47) 1 170

Segment cluster AA161187_node_(—)6 (SEQ ID NO:483) according to thepresent invention is support by 3 libraries. The number of libraries wasdetermined as previously described. This segment can be found in thefollowing transcript(s): AA161187_T7 (SEQ ID NO:42) and AA161187_T20(SEQ ID NO:45). Table 393 below describes the starting and endingposition of this segment on each transcript.

TABLE 393 Segment location on transcripts Segment Segment Transcriptname starting position ending position AA161187_T7 (SEQ ID NO: 42) 1 120AA161187_T20 (SEQ ID NO: 45) 1 120

Segment cluster AA161187_node_(—)14 (SEQ ID NO:484) according to thepresent invention is supported by 35 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): AA161187_T0 (SEQ ID NO:41), AA161187_T7 (SEQ IDNO:42), AA161187_T15 (SEQ ID NO:43), AA161187_T16 (SEQ ID NO:44),AA161187_T20 (SEQ ID NO:45), AA161187_T21 (SEQ ID NO:46) andAA161187_T22 (SEQ ID NO:47). Table 394 below describes the starting andending position of this segment on each transcript.

TABLE 394 Segment location on transcripts Segment Segment Transcriptname starting position ending position AA161187_T0 (SEQ ID NO: 41) 446656 AA161187_T7 (SEQ ID NO: 42) 377 587 AA161187_T15 (SEQ ID NO: 43) 446656 AA161187_T16 (SEQ ID NO: 44) 446 656 AA161187_T20 (SEQ ID NO: 45)371 581 AA161187_T21 (SEQ ID NO: 46) 446 656 AA161187_T22 (SEQ ID NO:47) 446 656

Segment cluster AA161187_node_(—)16 (SEQ ID NO:485) according to thepresent invention is supported by 2 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): AA161187_T22 (SEQ ID NO:47). Table 395 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 395 Segment location on transcripts Segment Segment Transcriptname starting position ending position AA161187_T22 (SEQ ID NO: 47) 657953

Segment cluster AA161187_node_(—)25 (SEQ ID NO:486) according to thepresent invention is supported by 13 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): AA161187_T16 (SEQ ID NO:44) and AA161187_T20(SEQ ID NO:45). Table 396 below describes the starting and endingposition of this segment on each transcript.

TABLE 396 Segment location on transcripts Segment Segment Transcriptname starting position ending position AA161187_T16 (SEQ ID NO: 44) 8801104 AA161187_T20 (SEQ ID NO: 45) 768 992

Microarray (chip) data is also available for this segment as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotides were found to hit this segment (in relation to lungcancer), shown in Table 397.

TABLE 397 Oligonucleotides related to this segment OverexpressedOligonucleotide name in cancers Chip reference AA161187_0_0_430 lungmalignant tumors LUN (SEQ ID NO: 222)

Segment cluster AA161187_node_(—)26 (SEQ ID NO:487) according to thepresent invention is supported by 39 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): AA161187_T0 (SEQ ID NO:41), AA161187_T7 (SEQ IDNO:42), AA161187_T15 (SEQ ID NO:43), AA161187_T16 (SEQ ID NO:44) andAA161187_T20 (SEQ ID NO:45). Table 398 below describes the starting andending position of this segment on each transcript.

TABLE 398 Segment location on transcripts Segment Segment Transcriptname starting position ending position AA161187_T0 (SEQ ID NO: 41) 8121173 AA161187_T7 (SEQ ID NO: 42) 743 1104 AA161187_T15 (SEQ ID NO: 43)744 1105 AA161187_T16 (SEQ ID NO: 44) 1105 1466 AA161187_T20 (SEQ ID NO:45) 993 1354

Segment cluster AA161187_node_(—)28 (SEQ ID NO:488) according to thepresent invention is supported by 4 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): AA161187_T21 (SEQ ID NO:46). Table 399 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 399 Segment location on transcripts Segment Segment Transcriptname starting position ending position AA161187_T21 (SEQ ID NO: 46) 6571171

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster AA161187_node_(—)4 (SEQ ID NO:489) according to thepresent invention is supported by 22 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): AA161187_T0 (SEQ ID NO:41), AA161187_T15 (SEQID NO:43), AA161187_T16 (SEQ ID NO:44), AA161187_T21 (SEQ ID NO:46) andAA161187_T22 (SEQ ID NO:47). Table 400 below describes the starting andending position of this segment on each transcript.

TABLE 400 Segment location on transcripts Segment Segment Transcriptname starting position ending position AA161187_T0 (SEQ ID NO: 41) 171197 AA161187_T15 (SEQ ID NO: 43) 171 197 AA161187_T16 (SEQ ID NO: 44)171 197 AA161187_T21 (SEQ ID NO: 46) 171 197 AA161187_T22 (SEQ ID NO:47) 171 197

Segment cluster AA161187_node_(—)7 (SEQ ID NO:490) according to thepresent invention can be found in the following transcript(s):AA161187_T7 (SEQ ID NO:42) and AA161187_T20 (SEQ ID NO:45). Table 401below describes the starting and ending position of this segment on eachtranscript.

TABLE 401 Segment location on transcripts Segment Segment Transcriptname starting position ending position AA161187_T7 (SEQ ID NO: 42) 121128 AA161187_T20 (SEQ ID NO: 45) 121 128

Segment cluster AA161187_node_(—)8 (SEQ ID NO:491) according to thepresent invention is supported by 23 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): AA161187_T0 (SEQ ID NO:41), AA161187_T7 (SEQ IDNO:42), AA161187_T15 (SEQ ID NO:43), AA161187_T16 (SEQ ID NO:44),AA161187_T20 (SEQ ID NO:45), AA161187_T21 (SEQ ID NO:46) andAA161187_T22 (SEQ ID NO:47). Table 402 below describes the starting andending position of this segment on each transcript.

TABLE 402 Segment location on transcripts Segment Segment Transcriptname starting position ending position AA161187_T0 (SEQ ID NO: 41) 198256 AA161187_T7 (SEQ ID NO: 42) 129 187 AA161187_T15 (SEQ ID NO: 43) 198256 AA161187_T16 (SEQ ID NO: 44) 198 256 AA161187_T20 (SEQ ID NO: 45)129 187 AA161187_T21 (SEQ ID NO: 46) 198 256 AA161187_T22 (SEQ ID NO:47) 198 256

Segment cluster AA161187_node_(—)9 (SEQ ID NO:492) according to thepresent invention is supported by 24 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): AA161187_T0 (SEQ ID NO:41), AA161187_T7 (SEQ IDNO:42), AA161187_T15 (SEQ ID NO:43), AA161187_T16 (SEQ ID NO:44),AA161187_T20 (SEQ ID NO:45), AA161187_T21 (SEQ ID NO:46) andAA161187_T22 (SEQ ID NO:47). Table 403 below describes the starting andending position of this segment on each transcript.

TABLE 403 Segment location on transcripts Segment Segment Transcriptname starting position ending position AA161187_T0 (SEQ ID NO: 41) 257298 AA161187_T7 (SEQ ID NO: 42) 188 229 AA161187_T15 (SEQ ID NO: 43) 257298 AA161187_T16 (SEQ ID NO: 44) 257 298 AA161187_T20 (SEQ ID NO: 45)188 229 AA161187_T21 (SEQ ID NO: 46) 257 298 AA161187_T22 (SEQ ID NO:47) 257 298

Segment cluster AA161187_node_(—)10 (SEQ ID NO:493) according to thepresent invention is supported by 25 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): AA161187_T0 (SEQ ID NO:41), AA161187_T7 (SEQ IDNO:42), AA161187_T15 (SEQ ID NO:43), AA161187_T16 (SEQ ID NO:44),AA161187_T20 (SEQ ID NO:45), AA161187_T21 (SEQ ID NO:46) andAA161187_T22 (SEQ ID NO:47). Table 404 below describes the starting andending position of this segment on each transcript.

TABLE 404 Segment location on transcripts Segment Segment Transcriptname starting position ending position AA161187_T0 (SEQ ID NO: 41) 299363 AA161187_T7 (SEQ ID NO: 42) 230 294 AA161187_T15 (SEQ ID NO: 43) 299363 AA161187_T16 (SEQ ID NO: 44) 299 363 AA161187_T20 (SEQ ID NO: 45)230 294 AA161187_T21 (SEQ ID NO: 46) 299 363 AA161187_T22 (SEQ ID NO:47) 299 363

Segment cluster AA161187_node_(—)12 (SEQ ID NO:494) according to thepresent invention can be found in the following transcript(s):AA161187_T0 (SEQ ID NO:41), AA161187_T7 (SEQ ID NO:42), AA161187_T15(SEQ ID NO:43), AA161187_T16 (SEQ ID NO:44), AA161187_T21 (SEQ ID NO:46)and AA161187_T22 (SEQ ID NO:47). Table 405 below describes the startingand ending position of this segment on each transcript.

TABLE 405 Segment location on transcripts Segment Segment Transcriptname starting position ending position AA161187_T0 (SEQ ID NO: 41) 364369 AA161187_T7 (SEQ ID NO: 42) 295 300 AA161187_T15 (SEQ ID NO: 43) 364369 AA161187_T16 (SEQ ID NO: 44) 364 369 AA161187_T21 (SEQ ID NO: 46)364 369 AA161187_T22 (SEQ ID NO: 47) 364 369

Segment cluster AA161187_node_(—)13 (SEQ ID NO:495) according to thepresent invention is supported by 25 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): AA161187_T0 (SEQ ID NO:41), AA161187_T7 (SEQ IDNO:42), AA161187_T15 (SEQ ID NO:43), AA161187_T16 (SEQ ID NO:44),AA161187_T20 (SEQ ID NO:45), AA161187_T21 (SEQ ID NO:46) andAA161187_T22 (SEQ ID NO:47). Table 406 below describes the starting andending position of this segment on each transcript.

TABLE 406 Segment location on transcripts Segment Segment Transcriptname starting position ending position AA161187_T0 (SEQ ID NO: 41) 370445 AA161187_T7 (SEQ ID NO: 42) 301 376 AA161187_T15 (SEQ ID NO: 43) 370445 AA161187_T16 (SEQ ID NO: 44) 370 445 AA161187_T20 (SEQ ID NO: 45)295 370 AA161187_T21 (SEQ ID NO: 46) 370 445 AA161187_T22 (SEQ ID NO:47) 370 445

Segment cluster AA161187_node_(—)19 (SEQ ID NO:496) according to thepresent invention is supported by 4 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): AA161187_T16 (SEQ ID NO:44). Table 407 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 407 Segment location on transcripts Segment Segment Transcriptname starting position ending position AA161187_T16 (SEQ ID NO: 44) 657693

Segment cluster AA161187_node_(—)20 (SEQ ID NO:497) according to thepresent invention is supported by 28 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): AA161187_T0 (SEQ ID NO:41), AA161187_T7 (SEQ IDNO:42), AA161187_T16 (SEQ ID NO:44) and AA161187_T20 (SEQ ID NO:45).Table 408 below describes the starting and ending position of thissegment on each transcript.

TABLE 408 Segment location on transcripts Segment Segment Transcriptname starting position ending position AA161187_T0 (SEQ ID NO: 41) 657682 AA161187_T7 (SEQ ID NO: 42) 588 613 AA161187_T16 (SEQ ID NO: 44) 694719 AA161187_T20 (SEQ ID NO: 45) 582 607

Segment cluster AA161187_node_(—)21 (SEQ ID NO:498) according to thepresent invention is supported by 31 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): AA161187_T0 (SEQ ID′NO:41), AA161187_T7 (SEQ IDNO:42), AA161187_T15 (SEQ ID NO:43), AA161187_T16 (SEQ ID NO:44) andAA161187_T20 (SEQ ID NO:45). Table 409 below describes the starting andending position of this segment on each transcript.

TABLE 409 Segment location on transcripts Segment Segment Transcriptname starting position ending position AA161187_T0 (SEQ ID NO: 41) 683741 AA161187_T7 (SEQ ID NO: 42) 614 672 AA161187_T15 (SEQ ID NO: 43) 657715 AA161187_T16 (SEQ ID NO: 44) 720 778 AA161187_T20 (SEQ ID NO: 45)608 666

Segment cluster AA161187_node_(—)22 (SEQ ID NO:499) according to thepresent invention is supported by 34 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): AA161187_T0 (SEQ ID NO:41), AA161187_T7 (SEQ IDNO:42), AA161187_T15 (SEQ ID NO:43), AA161187_T16 (SEQ ID NO:44) andAA161187_T20 (SEQ ID NO:45). Table 410 below describes the starting andending position of this segment on each transcript.

TABLE 410 Segment location on transcripts Segment Segment Transcriptname starting position ending position AA161187_T0 (SEQ ID NO: 41) 742769 AA161187_T7 (SEQ ID NO: 42) 673 700 AA161187_T15 (SEQ ID NO: 43) 716743 AA161187_T16 (SEQ ID NO: 44) 779 806 AA161187_T20 (SEQ ID NO: 45)667 694

Segment cluster AA161187_node_(—)23 (SEQ ID NO:500) according to thepresent invention is supported by 31 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): AA161187_T0 (SEQ ID NO:41), AA161187_T7 (SEQ IDNO:42), AA161187_T16 (SEQ ID NO:44) and AA161187_T20 (SEQ ID NO:45).Table 411 below describes the starting and ending position of thissegment on each transcript.

TABLE 411 Segment location on transcripts Segment Segment Transcriptname starting position ending position AA161187_T0 (SEQ ID NO: 41) 770811 AA161187_T7 (SEQ ID NO: 42) 701 742 AA161187_T16 (SEQ ID NO: 44) 807848 AA161187_T20 (SEQ ID NO: 45) 695 736

Segment cluster AA161187_node_(—)24 (SEQ ID NO:501) according to thepresent invention is supported by 12 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): AA161187_T16 (SEQ ID NO:44) and AA161187_T20(SEQ ID NO:45). Table 412 below describes the starting and endingposition of this segment on each transcript.

TABLE 412 Segment location on transcripts Segment Segment Transcriptname starting position ending position AA161187_T16 (SEQ ID NO: 44) 849879 AA161187_T20 (SEQ ID NO: 45) 737 767Variant protein alignment to the previously known protein:

Sequence name: TEST_HUMAN (SEQ ID NO: 1431) Sequence documentation:Alignment of: AA161187_P6 (SEQ ID NO: 1319) × TEST_HUMAN (SEQ ID NO:1431) Alignment segment 1/1: Quality: 2894.00 Escore: 0 Matching length:284 Total length: 284 Matching Percent Similarity: 100.00 MatchingPercent Identity: 100.00 Total Percent Similarity: 100.00 Total PercentIdentity: 100.00 Gaps: 0 Alignment:

Sequence name: TEST_HUMAN (SEQ ID NO: 1431) Sequence documentation:Alignment of: AA161187_P13 (SEQ ID NO: 1320) × TEST_HUMAN (SEQ ID NO:1431) Alignment segment 1/1: Quality: 1829.00 Escore: 0 Matching length:183 Total length: 183 Matching Percent Similarity: 100.00 MatchingPercent Identity: 100.00 Total Percent Similarity: 100.00 Total PercentIdentity: 100.00 Gaps: 0 Alignment:

Sequence name: TEST_HUMAN (SEQ ID NO: 1431) Sequence documentation:Alignment of: AA161187_P14 (SEQ ID NO: 1321) × TEST_HUMAN (SEQ ID NO:1431) Alignment segment 1/1: Quality: 1829.00 Escore: 0 Matching length:183 Total length: 183 Matching Percent Similarity: 100.00 MatchingPercent Identity: 100.00 Total Percent Similarity: 100.00 Total PercentIdentity: 100.00 Gaps: 0 Alignment:

Sequence name: TEST_HUMAN (SEQ ID NO: 1431) Sequence documentation:Alignment of: AA161187_P18 (SEQ ID NO: 1322) × TEST_HUMAN (SEQ ID NO:1431) Alignment segment 1/1: Quality: 1957.00 Escore: 0 Matching length:203 Total length: 205 Matching Percent Similarity: 100.00 MatchingPercent Identity: 100.00 Total Percent Similarity: 99.02 Total PercentIdentity: 99.02 Gaps: 1 Alignment:

Sequence name: TEST_HUMAN (SEQ ID NO: 1431) Sequence documentation:Alignment of: AA161187_P19 (SEQ ID NO: 1323) × TEST_HUMAN (SEQ ID NO:1431) Alignment segment 1/1: Quality: 1829.00 Escore: 0 Matching length:183 Total length: 183 Matching Percent Similarity: 100.00 MatchingPercent Identity: 100.00 Total Percent Similarity: 100.00 Total PercentIdentity: 100.00 Gaps: 0 Alignment:

Expression of Homo sapiens protease, serine, 21 (testisin) (PRSS21)AA161187 transcripts which are detectable by amplicon as depicted insequence name AA161187 seg25 (SEQ ID NO:1654) in normal and cancerouslung tissues

Expression of Homo sapiens protease, serine, 21 (testisin) (PRSS21)transcripts detectable by or according to seg25, AA161187 seg25 amplicon(SEQ ID NO:1654) and primers AA161187 seg17F2 (SEQ ID NO:1652) andAA161187 seg17R2 (SEQ ID NO:1653) was measured by real time PCR. Inparallel the expression of four housekeeping genes-PBGD (GenBankAccession No. BC019323 (SEQ ID NO:1713); amplicon-PBGD-amplicon, SEQ IDNO:334), HPRT1 (GenBank Accession No. NM_(—)000194 (SEQ ID NO:1714);amplicon-HPRT1-amplicon, SEQ ID NO:1297), Ubiquitin (GenBank AccessionNo. BC000449 (SEQ ID NO:1711); amplicon-Ubiquitin-amplicon, SEQ IDNO:328) and SDHA (GenBank Accession No. NM_(—)004168 (SEQ ID NO:1712);amplicon-SDHA-amplicon, SEQ ID NO:331), was measured similarly. For eachRT sample, the expression of the above amplicon was normalized to thegeometric mean of the quantities of the housekeeping genes. Thenormalized quantity of each RT sample was then divided by the median ofthe quantities of the normal post-mortem (PM) samples (Sample Nos.47-50, 90-93, 96-99, Table 2, above), to obtain a value of foldup-regulation for each sample relative to median of the normal PMsamples.

FIG. 64 is a histogram showing over expression of the above-indicatedHomo sapiens protease, serine, 21 (testisin) (PRSS21) transcripts incancerous lung samples relative to the normal samples.

As is evident from FIG. 64, the expression of Homo sapiens protease,serine, 21 (testisin) (PRSS21) transcripts detectable by the aboveamplicon(s) was higher in a few cancer samples than in the non-canceroussamples (Sample Nos. 46-50, 90-93, 96-99 Table 2). Notably anover-expression of at least 6 fold was found in 1 out of 15adenocarcinoma samples, 3 out of 16 squamous cell carcinoma samples, 1out of 4 large cell carcinoma samples.

Primer pairs are also optionally and preferably encompassed within thepresent invention; for example, for the above experiment, the followingprimer pair was used as a non-limiting illustrative example only of asuitable primer pair: AA161187 seg17F2 forward primer (SEQ ID NO:1652);and AA161187 seg17R2 reverse primer (SEQ ID NO:1653).

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: AA161187 seg25 (SEQ IDNO:1654).

Forward primer AA161187 seg17F2 (SEQ ID NO:1652): CCCTGTGCCTTATTTGACCCTReverse primer AA161187 seg17R2 (SEQ ID NO: 1653): GCTGGGTAGACTGGGTGCAAmplicon AA161187 seg25 (SEQ ID NO: 1654):CCTGTGCCTTATTTGACCCTCATGCCAACCCCGGGAGGTGGAGACTGTTGCCCCACTCTGCAGATGCAGAAACGGAGGCTTGGCTGCTGCCAGGGGGAGG A

Description for Cluster R66178

Cluster R66178 features 3 transcript(s) and 16 segment(s) of interest,the names for which are given in Tables 413 and 414, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 415.

TABLE 413 Transcripts of interest Transcript Name Sequence ID No.R66178_T2 48 R66178_T3 49 R66178_T7 50

TABLE 414 Segments of interest Segment Name Sequence ID No.R66178_node_0 502 R66178_node_6 503 R66178_node_8 504 R66178_node_15 505R66178_node_24 506 R66178_node_26 507 R66178_node_27 508 R66178_node_4509 R66178_node_5 510 R66178_node_9 511 R66178_node_11 512R66178_node_16 513 R66178_node_18 514 R66178_node_19 515 R66178_node_20516 R66178_node_21 517

TABLE 415 Proteins of interest Protein Name Sequence ID No.Corresponding Transcript(s) R66178_P3 1324 R66178_T2 (SEQ ID NO: 48)R66178_P4 1325 R66178_T3 (SEQ ID NO: 49) R66178_P8 1326 R66178_T7 (SEQID NO: 50)

These sequences are variants of the known protein Poliovirus receptorrelated protein 1 precursor (SwissProt accession identifier PVR1_HUMAN;known also according to the synonyms Herpes virus entry mediator C;HveC; Nectin 1; Herpesvirus Ig-like receptor; HIgR; CD111 antigen), SEQID NO: 1432, referred to herein as the previously known protein.

Protein Poliovirus receptor related protein 1 precursor (SEQ ID NO:1432)is known or believed to have the following function(s): probablyinvolved in cell adhesion; receptor for alphaherpesvirus (HSV-1, HSV-2and Pseudorabies virus) entry into cells. The sequence for proteinPoliovirus receptor related protein 1 precursor is given at the end ofthe application, as “Poliovirus receptor related protein 1 precursoramino acid sequence”. Protein Poliovirus receptor related protein 1precursor localization is believed to be Type I membrane protein(isoforms alpha and delta). Secreted (isoform gamma).

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: immune response; cell-celladhesion, which are annotation(s) related to Biological Process; celladhesion receptor; protein binding; coreceptor, which are annotation(s)related to Molecular Function; and adherens junction; integral membraneprotein, which are annotation(s) related to Cellular Component.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

As noted above, cluster R66178 features 3 transcript(s), which werelisted in Table 413 above. These transcript(s) encode for protein(s)which are variant(s) of protein Poliovirus receptor related protein 1precursor (SEQ ID NO:1432). A description of each variant proteinaccording to the present invention is now provided.

Variant protein R66178_P3 (SEQ ID NO:1324) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) R66178_T2 (SEQ ID NO:48). Analignment is given to the known protein (Poliovirus receptor relatedprotein 1 precursor (SEQ ID NO:1432)) at the end of the application. Oneor more alignments to one or more previously published protein sequencesare given at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between R66178_P3 (SEQ ID NO:1324) and PVR1_HUMAN (SEQID NO:1432):

1. An isolated chimeric polypeptide encoding for R66178_P3 (SEQ IDNO:1324), comprising a first amino acid sequence being at least 90%homologous toMARMGLAGAAGRWWGLALGLTAFFLPGVHSQVVQVNDSMYGFIGTDVVLHCSFANPLPSVKITQVTWQKSTNGSKQNVAIYNPSMGVSVLAPYRERVEFLRPSFTDGTIRLSRLELEDEGVYICEFATFPTGNRESQLNLTVMAKPTNWIEGTQAVLRAKKGQDDKVLVATCTSANGKPPSVVSWETRLKGEAEYQEIRNPNGTVTVISRYRLVPSREAHQQSLACIVNYHMDRFKESLTLNVQYEPEVTIEGFDGNWYLQRMDVKLTCKADANPPATEYHWTTLNGSLPKGVEAQNRTLFFKGPINYSLAGTYICEATNPIGTRSGQVEVNIT correspondingto amino acids 1-334 of PVR1_HUMAN (SEQ ID NO:1432), which alsocorresponds to amino acids 1-334 of R66178_P3 (SEQ ID NO:1324), and asecond amino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceGEGHSLPISPGVLQTQNCGP (SEQ ID NO: 694) corresponding to amino acids335-354 of R66178_P3 (SEQ ID NO:1324), wherein said first amino acidsequence and second amino acid sequence are contiguous and in asequential order.

2. An isolated polypeptide encoding for a tail of R66178_P3 (SEQ IDNO:1324), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence GEGHSLPISPGVLQTQNCGP (SEQ ID NO: 694) in R66178_P3 (SEQ IDNO:1324).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein R66178_P3 (SEQ ID NO:1324) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table416, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein R66178_P3 (SEQ ID NO:1324) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 416 Amino acid mutations SNP position(s) on amino acid Alternativesequence amino acid(s) Previously known SNP? 77 N -> S No

The glycosylation sites of variant protein R66178_P3 (SEQ ID NO:1324),as compared to the known protein Poliovirus receptor related protein 1precursor (SEQ ID NO:1432), are described in Table 417 (given accordingto their position(s) on the amino acid sequence in the first column; thesecond column indicates whether the glycosylation site is present in thevariant protein; and the last column indicates whether the position isdifferent on the variant protein).

TABLE 417 Glycosylation site(s) Position(s) on known amino acid PresentPosition sequence in variant protein? in variant protein? 72 yes 72 297yes 297 202 yes 202 307 yes 307 332 yes 332 139 yes 139 36 yes 36 286yes 286

Variant protein R66178_P3 (SEQ ID NO:1324) is encoded by the followingtranscript(s): R66178_T2 (SEQ ID NO:48), for which the sequence(s)is/are given at the end of the application. The coding portion oftranscript R66178_T2 (SEQ ID NO:48) is shown in bold; this codingportion starts at position 634 and ends at position 1695. The transcriptalso has the following SNPs as listed in Table 418 (given according totheir position on the nucleotide sequence, with the alternative nucleicacid listed; the last column indicates whether the SNP is known or not;the presence of known SNPs in variant protein R66178_P3 (SEQ ID NO:1324)sequence provides support for the deduced sequence of this variantprotein according to the present invention).

TABLE 418 Nucleic acid SNPs SNP position on nucleotide AlternativePreviously sequence nucleic acid known SNP? 474 -> T No 476 -> C No 632-> T No 633 G -> T No 863 A -> G No 897 C -> T Yes 2178 A -> G No 2465 G-> A Yes 2687 G -> A Yes

Variant protein R66178_P4 (SEQ ID NO:1325) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) R66178_T3 (SEQ ID NO:49). Analignment is given to the known protein (Poliovirus receptor relatedprotein 1 precursor (SEQ ID NO:1432)) at the end of the application. Oneor more alignments to one or more previously published protein sequencesare given at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between R66178_P4 (SEQ ID NO:1325) and PVR1_HUMAN (SEQID NO:1432):

1. An isolated chimeric polypeptide encoding for R66178_P4 (SEQ IDNO:1325), comprising a first amino acid sequence being at least 90%homologous toMARMGLAGAAGRWWGLALGLTAFFLPGVHSQVVQVNDSMYGFIGTDVVLHCSFANPLPSVKITQVTWQKSTNGSKQNVAIYNPSMGVSVLAPYRERVEFLRPSFTDGTIRLSRLELEDEGVYICEFATFPTGNRESQLNLTVMAKPTNWIEGTQAVLRAKKGQDDKVLVATCTSANGKPPSVVSWETRLKGEAEYQEIRNPNGTVTVISRYRLVPSREAHQQSLACIVNYHMDRFKESLTLNVQYEPEVTIEGFDGNWYLQRMDVKLTCKADANPPATEYHWTTLNGSLPKGVEAQNRTLFFKGPINYSLAGTYICEATNPIGTRSGQVEVNIT correspondingto amino acids 1-334 of PVR1_HUMAN (SEQ ID NO:1432), which alsocorresponds to amino acids 1-334 of R66178_P4 (SEQ ID NO:1325), and asecond amino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceAFCQLIYPGKGRTRARMF (SEQ ID NO: 1702) corresponding to amino acids335-352 of R66178_P4 (SEQ ID NO:1325), wherein said first amino acidsequence and second amino acid sequence are contiguous and in asequential order.

2. An isolated polypeptide encoding for a tail of R66178_P4 (SEQ IDNO:1325), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence AFCQLIYPGKGRTRARMF (SEQ ID NO: 1702) in R66178_P4 (SEQ IDNO:1325).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein R66178_P4 (SEQ ID NO:1325) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table419, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein R66178_P4 (SEQ ID NO:1325) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 419 Amino acid mutations SNP position(s) on Alternative amino acidsequence amino acid(s) Previously known SNP? 77 N -> S No

The glycosylation sites of variant protein R66178_P4 (SEQ ID NO:1325),as compared to the known protein Poliovirus receptor related protein 1precursor (SEQ ID NO:1432), are described in Table 420 (given accordingto their position(s) on the amino acid sequence in the first column; thesecond column indicates whether the glycosylation site is present in thevariant protein; and the last column indicates whether the position isdifferent on the variant protein).

TABLE 420 Glycosylation site(s) Position(s) on known amino acid PresentPosition sequence in variant protein? in variant protein? 72 yes 72 297yes 297 202 yes 202 307 yes 307 332 yes 332 139 yes 139 36 yes 36 286yes 286

Variant protein R66178_P4 (SEQ ID NO:1325) is encoded by the followingtranscript(s): R66178_T3 (SEQ ID NO:49), for which the sequence(s)is/are given at the end of the application. The coding portion oftranscript R66178_T3 (SEQ ID NO:49) is shown in bold; this codingportion starts at position 634 and ends at position 1689. The transcriptalso has the following SNPs as listed in Table 421 (given according totheir position on the nucleotide sequence, with the alternative nucleicacid listed; the last column indicates whether the SNP is known or not;the presence of known SNPs in variant protein R66178_P4 (SEQ ID NO:1325)sequence provides support for the deduced sequence of this variantprotein according to the present invention).

TABLE 421 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 474 -> T No 476 -> C No 632-> T No 633 G -> T No 863 A -> G No 897 C -> T Yes 1762 C -> Yes

Variant protein R66178_P8 (SEQ ID NO:1326) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) R66178_T7 (SEQ ID NO:50). Analignment is given to the known protein (Poliovirus receptor relatedprotein 1 precursor (SEQ ID NO:1432)) at the end of the application. Oneor more alignments to one or more previously published protein sequencesare given at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between R66178_P8 (SEQ ID NO:1326) and PVR1_HUMAN (SEQID NO:1432):

1. An isolated chimeric polypeptide encoding for R66178_P8 (SEQ IDNO:1326), comprising a first amino acid sequence being at least 90%homologous toMARMGLAGAAGRWWGLALGLTAFFLPGVHSQVVQVNDSMYGFIGTDVVLHCSFANPLPSVKITQVTWQKSTNGSKQNVAIYNPSMGVSVLAPYRERVEFLRPSFTDGTIRLSRLELEDEGVYICEFATFPTGNRESQLNLTVMAKPTNWIEGTQAVLRAKKGQDDKVLVATCTSANGKPPSVVSWETRLKGEAEYQEIRNPNGTVTVISRYRLVPSREAHQQSLACIVNYHMDRFKESLTLNVQYEPEVTIEGFDGNWYLQRMDVKLTCKADANPPATEYHWTTLNGSLPKGVEAQNRTLFFKGPINYSLAGTYICEATNPIGTRSGQVE corresponding toamino acids 1-330 of PVR1_HUMAN (SEQ ID NO:1432), which also correspondsto amino acids 1-330 of R66178_P8 (SEQ ID NO:1326), and a second aminoacid sequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequenceNSPTPRLLPNMGGAPGRCPRPSLGAWRGASCWC (SEQ ID NO: 1717) corresponding toamino acids 331-363 of R66178_P8 (SEQ ID NO:1326), wherein said firstamino acid sequence and second amino acid sequence are contiguous and ina sequential order.

2. An isolated polypeptide encoding for a tail of R66178_P8 (SEQ IDNO:1326), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence NSPTPRLLPNMGGAPGRCPRPSLGAWRGASCWC (SEQ ID NO: 1717) inR66178_P8 (SEQ ID NO:1326).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein R66178_P8 (SEQ ID NO:1326) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table422, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein R66178_P8 (SEQ ID NO:1326) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 422 Amino acid mutations SNP position(s) on amino acid Alternativesequence amino acid(s) Previously known SNP? 77 N -> S No

The glycosylation sites of variant protein R66178_P8 (SEQ ID NO:1326),as compared to the known protein Poliovirus receptor related protein 1precursor (SEQ ID NO:1432), are described in Table 423 (given accordingto their position(s) on the amino acid sequence in the first column; thesecond column indicates whether the glycosylation site is present in thevariant protein; and the last column indicates whether the position isdifferent on the variant protein).

TABLE 423 Glycosylation site(s) Position(s) on known amino acid PresentPosition sequence in variant protein? in variant protein? 72 yes 72 297yes 297 202 yes 202 307 yes 307 332 no 139 yes 139 36 yes 36 286 yes 286

Variant protein R66178_P8 (SEQ ID NO:1326) is encoded by the followingtranscript(s): R66178_T7 (SEQ ID NO:50), for which the sequence(s)is/are given at the end of the application. The coding portion oftranscript R66178_T7 (SEQ ID NO:50) is shown in bold; this codingportion starts at position 634 and ends at position 1722. The transcriptalso has the following SNPs as listed in Table 424 (given according totheir position on the nucleotide sequence, with the alternative nucleicacid listed; the last column indicates whether the SNP is known or not;the presence of known SNPs in variant protein R66178_P8 (SEQ ID NO:1326)sequence provides support for the deduced sequence of this variantprotein according to the present invention).

TABLE 424 Nucleic acid SNPs SNP position on nucleotide AlternativePreviously sequence nucleic acid known SNP? 474 -> T No 476 -> C No 632-> T No 633 G -> T No 863 A -> G No 897 C -> T Yes 2210 A -> C No 2211 A-> C No

As noted above, cluster R66178 features 16 segment(s), which were listedin Table 414 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster R66178_node_(—)0 (SEQ ID NO:502) according to thepresent invention is supported by 19 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R66178_T2 (SEQ ID NO:48), R66178_T3 (SEQ IDNO:49) and R66178_T7 (SEQ ID NO:50). Table 425 below describes thestarting and ending position of this segment on each transcript.

TABLE 425 Segment location on transcripts Segment Segment Transcriptname starting position ending position R66178_T2 (SEQ ID NO: 48) 1 712R66178_T3 (SEQ ID NO: 49) 1 712 R66178_T7 (SEQ ID NO: 50) 1 712

Segment cluster R66178_node_(—)6 (SEQ ID NO:503) according to thepresent invention is supported by 39 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R66178_T2 (SEQ ID NO:48), R66178_T3 (SEQ IDNO:49) and R66178_T7 (SEQ ID NO:50). Table 426 below describes thestarting and ending position of this segment on each transcript.

TABLE 426 Segment location on transcripts Segment Segment Transcriptname starting position ending position R66178_T2 (SEQ ID NO: 48) 7621063 R66178_T3 (SEQ ID NO: 49) 762 1063 R66178_T7 (SEQ ID NO: 50) 7621063

Segment cluster R66178_node_(—)8 (SEQ ID NO:504) according to thepresent invention is supported by 39 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R66178_T2 (SEQ ID NO:48), R66178_T3 (SEQ IDNO:49) and R66178_T7 (SEQ ID NO:50). Table 427 below describes thestarting and ending position of this segment on each transcript.

TABLE 427 Segment location on transcripts Segment Segment Transcriptname starting position ending position R66178_T2 (SEQ ID NO: 48) 10641269 R66178_T3 (SEQ ID NO: 49) 1064 1269 R66178_T7 (SEQ ID NO: 50) 10641269

Microarray (chip) data is also available for this segment as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotides were found to hit this segment (in relation to lungcancer), shown in Table 428.

TABLE 428 Oligonucleotides related to this segment Overexpressed ChipOligonucleotide name in cancers reference R66178_0_7_0 (SEQ ID NO: 223)lung malignant tumors LUN

Segment cluster R66178_node_(—)15 (SEQ ID NO:505) according to thepresent invention is supported by 40 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R66178_T2 (SEQ ID NO:48), R66178_T3 (SEQ IDNO:49) and R66178_T7 (SEQ ID NO:50). Table 429 below describes thestarting and ending position of this segment on each transcript.

TABLE 429 Segment location on transcripts Segment Segment Transcriptname  starting position  ending position R66178_T2 (SEQ ID NO: 48) 14851623 R66178_T3 (SEQ ID NO: 49) 1485 1623 R66178_T7 (SEQ ID NO: 50) 14851623

Segment cluster R66178_node_(—)24 (SEQ ID NO:506) according to thepresent invention is supported by 10 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R66178_T2 (SEQ ID NO:48). Table 430 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 430 Segment location on transcripts Segment Segment Transcriptname starting position ending position R66178_T2 (SEQ ID NO: 48) 16373110

Segment cluster R66178_node_(—)26 (SEQ ID NO:507) according to thepresent invention is supported by 24 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R66178_T7 (SEQ ID NO:50). Table 431 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 431 Segment location on transcripts Segment Segment Transcriptname starting position ending position R66178_T7 (SEQ ID NO: 50) 16242087

Segment cluster R66178_node_(—)27 (SEQ ID NO:508) according to thepresent invention is supported by 12 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R66178_T7 (SEQ ID NO:50). Table 432 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 432 Segment location on transcripts Segment Segment Transcriptname starting position ending position R66178_T7 (SEQ ID NO: 50) 20882364

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster R66178_node_(—)4 (SEQ ID NO:509) according to thepresent invention is supported by 21 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R66178_T2 (SEQ ID NO:48), R66178_T3 (SEQ IDNO:49) and R66178_T7 (SEQ ID NO:50). Table 433 below describes thestarting and ending position of this segment on each transcript.

TABLE 433 Segment location on transcripts Segment Segment Transcriptname starting position ending position R66178_T2 (SEQ ID NO: 48) 713 749R66178_T3 (SEQ ID NO: 49) 713 749 R66178_T7 (SEQ ID NO: 50) 713 749

Segment cluster R66178_node_(—)5 (SEQ ID NO:510) according to thepresent invention can be found in the following transcript(s): R66178_T2(SEQ ID NO:48), R66178_T3 (SEQ ID NO:49) and R66178_T7 (SEQ ID NO:50).Table 434 below describes the starting and ending position of thissegment on each transcript.

TABLE 434 Segment location on transcripts Segment Segment Transcriptname  starting position  ending position R66178_T2 (SEQ ID NO: 48) 750761 R66178_T3 (SEQ ID NO: 49) 750 761 R66178_T7 (SEQ ID NO: 50) 750 761

Segment cluster R66178_node_(—)9 (SEQ NO:511) according to the presentinvention is supported by 44 libraries. The number of libraries wasdetermined as previously described. This segment can be found in thefollowing transcript(s): R66178 T2 (SEQ ID NO:48), R66178_T3 (SEQ IDNO:49) and R66178_T7 (SEQ ID NO:50). Table 435 below describes thestarting and ending position of this segment on each transcript.

TABLE 435 Segment location on transcripts Segment Segment Transcriptname starting position ending position R66178_T2 (SEQ ID NO: 48) 12701366 R66178_T3 (SEQ ID NO: 49) 1270 1366 R66178_T7 (SEQ ID NO: 50) 12701366

Segment cluster R66178_node_(—)11 (SEQ ID NO:512) according to thepresent invention is supported by 44 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R66178_T2 (SEQ ID NO:48), R66178_T3 (SEQ IDNO:49) and R66178_T7 (SEQ ID NO:50). Table 436 below describes thestarting and ending position of this segment on each transcript.

TABLE 436 Segment location on transcripts Segment Segment Transcriptname starting position ending position R66178_T2 (SEQ ID NO: 48) 13671484 R66178_T3 (SEQ ID NO: 49) 1367 1484 R66178_T7 (SEQ ID NO: 50) 13671484

Segment cluster R66178_node_(—)16 (SEQ ID NO:513) according to thepresent invention can be found in the following transcript(s): R66178_T2(SEQ ID NO:48) and R66178_T3 (SEQ ID NO:49). Table 437 below describesthe starting and ending position of this segment on each transcript.

TABLE 437 Segment location on transcripts Segment Segment Transcriptname starting position ending position R66178_T2 (SEQ ID NO: 48) 16241636 R66178_T3 (SEQ ID NO: 49) 1624 1636

Segment cluster R66178_node_(—)18 (SEQ ID NO:514) according to thepresent invention is supported by 13 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R66178_T3 (SEQ ID NO:49). Table 438 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 438 Segment location on transcripts Segment Segment Transcriptname starting position ending position R66178_T3 (SEQ ID NO: 49) 16371743

Segment cluster R66178_node_(—)19 (SEQ ID NO:515) according to thepresent invention can be found in the following transcript(s): R66178_T3(SEQ ID NO:49). Table 439 below describes the starting and endingposition of this segment on each transcript.

TABLE 439 Segment location on transcripts Segment Segment Transcriptname starting position ending position R66178_T3 (SEQ ID NO: 49) 17441763

Segment cluster R66178_node_(—)20 (SEQ ID NO:516) according to thepresent invention is supported by 12 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R66178_T3 (SEQ ID NO:49). Table 440 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 440 Segment location on transcripts Segment Segment Transcriptname starting position ending position R66178_T3 (SEQ ID NO: 49) 17641791

Segment cluster R66178_node_(—)21 (SEQ ID NO:517) according to thepresent invention is supported by 11 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R66178_T3 (SEQ ID NO:49). Table 441 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 441 Segment location on transcripts Segment Segment Transcriptname starting position ending position R66178_T3 (SEQ ID NO: 49) 17921903Variant protein alignment to the previously known protein:

Sequence name: PVR1_HUMAN (SEQ ID NO: 1432) Sequence documentation:Alignment of: R66178_P3 (SEQ ID NO: 1324) × PVR1_HUMAN (SEQ ID NO: 1432). . . Alignment segment 1/1: Quality: 3826.00 Escore: 0 Matching length:334 Total length: 334 Matching Percent Similarity: 100.00 MatchingPercent Identity: 100.00 Total Percent Similarity: 100.00 Total PercentIdentity: 100.00 Gaps: 0 Alignment:

Sequence name: PVR1_HUMAN (SEQ ID NO: 1432) Sequence documentation:Alignment of: R66178_P4 (SEQ ID NO: 1325) × PVR1_HUMAN (SEQ ID NO: 1432). . . Alignment segment 1/1: Quality: 3294.00 Escore: 0 Matching length:336 Total length: 336 Matching Percent Similarity: 99.70 MatchingPercent Identity: 99.70 Total Percent Similarity: 99.70 Total PercentIdentity: 99.70 Gaps: 0 Alignment:

Sequence name: PVR1_HUMAN (SEQ ID NO: 1432) Sequence documentation:Alignment of: R66178_P8 (SEQ ID NO: 1326) × PVR1_HUMAN (SEQ ID NO: 1432). . . Alignment segment 1/1: Quality: 3250.00 Escore: 0 Matching length:330 Total length: 330 Matching Percent Similarity: 100.00 MatchingPercent Identity: 100.00 Total Percent Similarity: 100.00 Total PercentIdentity: 100.00 Gaps: 0 Alignment:

Description for Cluster HUMPHOSLIP

Cluster HUMPHOSLIP features 7 transcript(s) and 53 segment(s) ofinterest, the names for which are given in Tables 442 and 443,respectively, the sequences themselves are given at the end of theapplication. The selected protein variants are given in table 444.

TABLE 442 Transcripts of interest Transcript Name Sequence ID No.HUMPHOSLIP_PEA_2_T6 51 HUMPHOSLIP_PEA_2_T7 52 HUMPHOSLIP_PEA_2_T14 53HUMPHOSLIP_PEA_2_T16 54 HUMPHOSLIP_PEA_2_T17 55 HUMPHOSLIP_PEA_2_T18 56HUMPHOSLIP_PEA_2_T19 57

TABLE 443 Segments of interest Segment Name Sequence ID No.HUMPHOSLIP_PEA_2_node_0 518 HUMPHOSLIP_PEA_2_node_19 519HUMPHOSLIP_PEA_2_node_34 520 HUMPHOSLIP_PEA_2_node_68 521HUMPHOSLIP_PEA_2_node_70 522 HUMPHOSLIP_PEA_2_node_75 523HUMPHOSLIP_PEA_2_node_2 524 HUMPHOSLIP_PEA_2_node_3 525HUMPHOSLIP_PEA_2_node_4 526 HUMPHOSLIP_PEA_2_node_6 527HUMPHOSLIP_PEA_2_node_7 528 HUMPHOSLIP_PEA_2_node_8 529HUMPHOSLIP_PEA_2_node_9 530 HUMPHOSLIP_PEA_2_node_14 531HUMPHOSLIP_PEA_2_node_15 532 HUMPHOSLIP_PEA_2_node_16 533HUMPHOSLIP_PEA_2_node_17 534 HUMPHOSLIP_PEA_2_node_23 535HUMPHOSLIP_PEA_2_node_24 536 HUMPHOSLIP_PEA_2_node_25 537HUMPHOSLIP_PEA_2_node_26 538 HUMPHOSLIP_PEA_2_node_29 539HUMPHOSLIP_PEA_2_node_30 540 HUMPHOSLIP_PEA_2_node_33 541HUMPHOSLIP_PEA_2_node_36 542 HUMPHOSLIP_PEA_2_node_37 543HUMPHOSLIP_PEA_2_node_39 544 HUMPHOSLIP_PEA_2_node_40 545HUMPHOSLIP_PEA_2_node_41 546 HUMPHOSLIP_PEA_2_node_42 547HUMPHOSLIP_PEA_2_node_44 548 HUMPHOSLIP_PEA_2_node_45 549HUMPHOSLIP_PEA_2_node_47 550 HUMPHOSLIP_PEA_2_node_51 551HUMPHOSLIP_PEA_2_node_52 552 HUMPHOSLIP_PEA_2_node_53 553HUMPHOSLIP_PEA_2_node_54 554 HUMPHOSLIP_PEA_2_node_55 555HUMPHOSLIP_PEA_2_node_58 556 HUMPHOSLIP_PEA_2_node_59 557HUMPHOSLIP_PEA_2_node_60 558 HUMPHOSLIP_PEA_2_node_61 559HUMPHOSLIP_PEA_2_node_62 560 HUMPHOSLIP_PEA_2_node_63 562HUMPHOSLIP_PEA_2_node_64 562 HUMPHOSLIP_PEA_2_node_65 563HUMPHOSLIP_PEA_2_node_66 564 HUMPHOSLIP_PEA_2_node_67 565HUMPHOSLIP_PEA_2_node_69 566 HUMPHOSLIP_PEA_2_node_71 567HUMPHOSLIP_PEA_2_node_72 568 HUMPHOSLIP_PEA_2_node_73 569HUMPHOSLIP_PEA_2_node_74 570

TABLE 444 Proteins of interest Protein Name Sequence ID No.Corresponding Transcript(s) HUMPHOSLIP_PEA_2_P10 1327HUMPHOSLIP_PEA_2_T17 (SEQ ID NO: 55) HUMPHOSLIP_PEA_2_P12 1328HUMPHOSLIP_PEA_2_T19 (SEQ ID NO: 57) HUMPHOSLIP_PEA_2_P30 1329HUMPHOSLIP_PEA_2_T6 (SEQ ID NO: 51) HUMPHOSLIP_PEA_2_P31 1330HUMPHOSLIP_PEA_2_T7 (SEQ ID NO: 52) HUMPHOSLIP_PEA_2_P33 1331HUMPHOSLIP_PEA_2_T14 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_P34 1332HUMPHOSLIP_PEA_2_T16 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_P35 1333HUMPHOSLIP_PEA_2_T18 (SEQ ID NO: 56)

These sequences are variants of the known protein Phospholipid transferprotein precursor (SwissProt accession identifier PLTP_HUMAN; known alsoaccording to the synonyms Lipid transfer protein II), SEQ ID NO: 1433,referred to herein as the previously known protein.

Protein Phospholipid transfer protein precursor (SEQ ID NO:1433) isknown or believed to have the following function(s): Converts HDL intolarger and smaller particles. May play a key role in extracellularphospholipid transport and modulation of HDL particles. The sequence forprotein Phospholipid transfer protein precursor is given at the end ofthe application, as “Phospholipid transfer protein precursor amino acidsequence”. Known polymorphisms for this sequence are as shown in Table445.

TABLE 445 Amino acid mutations for Known Protein SNP position(s) onamino acid sequence Comment 282 R -> Q. /FTId = VAR_017020. 372 R -> H./FTId = VAR_017021. 380 R -> W (in dbSNP: 6065903). /FTId = VAR_017022.444 F -> L (in dbSNP: 1804161). /FTId = VAR_012073. 487 T -> K (indbSNP: 1056929). /FTId = VAR_012074. 18 E -> V

Protein Phospholipid transfer protein precursor (SEQ ID NO:1433)localization is believed to be Secreted.

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: lipid metabolism; lipidtransport, which are annotation(s) related to Biological Process; lipidbinding, which are annotation(s) related to Molecular Function; andextracellular, which are annotation(s) related to Cellular Component.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

For this cluster, at least one oligonucleotide was found to demonstrateoverexpression of the cluster, although not of at least onetranscript/segment as listed below. Microarray (chip) data is alsoavailable for this cluster as follows. Various oligonucleotides weretested for being differentially expressed in various disease conditions,particularly cancer, as previously described. The followingoligonucleotides were found to hit this cluster but not othersegments/transcripts below, shown in Table 446, with regard to lungcancer.

TABLE 446 Oligonucleotides related to this cluster Oligonucleotide nameOverexpressed in cancers Chip reference HUMPHOSLIP_0_0_18458 lungmalignant tumors LUN (SEQ ID NO: 224)

As noted above, cluster HUMPHOSLIP features 7 transcript(s), which werelisted in Table 1 above. These transcript(s) encode for protein(s) whichare variant(s) of protein Phospholipid transfer protein precursor (SEQID NO:1433). A description of each variant protein according to thepresent invention is now provided.

Variant protein HUMPHOSLIP_PEA_(—)2_P10 (SEQ ID NO:1327) according tothe present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMPHOSLIP_PEA_(—)2_T17(SEQ ID NO:55). An alignment is given to the known protein (Phospholipidtransfer protein precursor (SEQ ID NO:1433)) at the end of theapplication. One or more alignments to one or more previously publishedprotein sequences are given at the end of the application. A briefdescription of the relationship of the variant protein according to thepresent invention to each such aligned protein is as follows:

Comparison report between HUMPHOSLIP_PEA_(—)2_P10 (SEQ ID NO:1327) andPLTP_HUMAN (SEQ ID NO:1433):

1. An isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA_(—)2_P10(SEQ ID NO:1327), comprising a first amino acid sequence being at least90% homologous toMALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGHFYYNISEcorresponding to amino acids 1-67 of PLTP_HUMAN (SEQ ID NO:1433), whichalso corresponds to amino acids 1-67 of HUMPHOSLIP_PEA_(—)2_P10 (SEQ IDNO:1327), and a second amino acid sequence being at least 90% homologoustoKVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLLDTVPVRSSVDELVGIDYSLMKDPVASTSNLDMDFRGAFFPLTERNWSLPNRAVEPQLQEEERMVYVAFSEFFFDSAMESYFRAGALQLLLVGDKVPHDLDMLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKPSGTTISVTASVTIALVPPDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFRIYSNHSALESLALIPLQAPLKTMLQIGVMPMLNERTWRGVQIPLPEGINFVHEVVTNHAGFLTIGADLHFAKGLREVIEKNRPADVRASTAPTPSTAAV corresponding to aminoacids 163-493 of PLTP_HUMAN (SEQ ID NO:1433), which also corresponds toamino acids 68-398 of HUMPHOSLIP_PEA_(—)2_P10 (SEQ ID NO:1327), whereinsaid first amino acid sequence and second amino acid sequence arecontiguous and in a sequential order.

2. An isolated chimeric polypeptide encoding for an edge portion ofHUMPHOSLIP_PEA_(—)2_P10 (SEQ ID NO:1327), comprising a polypeptidehaving a length “n”, wherein n is at least about 10 amino acids inlength, optionally at least about 20 amino acids in length, preferablyat least about 30 amino acids in length, more preferably at least about40 amino acids in length and most preferably at least about 50 aminoacids in length, wherein at least two amino acids comprise EK, having astructure as follows: a sequence starting from any of amino acid numbers67−x to 67; and ending at any of amino acid numbers 68+((n−2)−x), inwhich x varies from 0 to n−2.

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein HUMPHOSLIP_PEA_(—)2_P10 (SEQ ID NO:1327) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 447, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein HUMPHOSLIP_PEA_(—)2_P10 (SEQ ID NO:1327) sequenceprovides support for the deduced sequence of this variant proteinaccording to the present invention).

TABLE 447 Amino acid mutations SNP position(s) on amino acid sequenceAlternative amino acid(s) Previously known SNP? 16 H -> R Yes 18 E -> VYes 113 S -> F Yes 118 V -> No 140 R -> No 140 R -> P No 150 N -> No 160P -> No 201 P -> No 274 M -> No 285 R -> W Yes 292 Q -> No 315 L -> * No330 M -> I Yes 349 F -> L Yes 392 T -> K Yes

The glycosylation sites of variant protein HUMPHOSLIP_PEA_(—)2_P10 (SEQID NO:1327), as compared to the known protein Phospholipid transferprotein precursor (SEQ ID NO:1433), are described in Table 448 (givenaccording to their position(s) on the amino acid sequence in the firstcolumn; the second column indicates whether the glycosylation site ispresent in the variant protein; and the last column indicates whetherthe position is different on the variant protein).

TABLE 448 Glycosylation site(s) Position(s) on known Present Positionamino acid sequence in variant protein? in variant protein? 94 no 143 no64 yes 64 245 yes 150 398 yes 303 117 no

Variant protein HUMPHOSLIP_PEA_(—)2_P10 (SEQ ID NO:1327) is encoded bythe following transcript(s): HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), forwhich the sequence(s) is/are given at the end of the application. Thecoding portion of transcript HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55) isshown in bold; this coding portion starts at position 276 and ends atposition 1469. The transcript also has the following SNPs as listed inTable 449 (given according to their position on the nucleotide sequence,with the alternative nucleic acid listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein HUMPHOSLIP_PEA_(—)2_P10 (SEQ ID NO:1327) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 449 Nucleic acid SNPs SNP position on nucleotide sequenceAlternative nucleic acid Previously known SNP? 174 G -> T No 175 A -> TNo 322 A -> G Yes 328 A -> T Yes 431 G -> A Yes 551 C -> T Yes 613 C ->T Yes 628 T -> No 694 G -> No 694 G -> C No 723 A -> No 753 C -> No 876C -> No 1037 C -> T Yes 1097 G -> No 1128 C -> T Yes 1149 C -> No 1219 T-> A No 1230 C -> T Yes 1265 G -> C Yes 1322 T -> A Yes 1450 C -> A Yes1469 C -> T No 1549 C -> T Yes 1565 A -> G No 1565 A -> T No 1630 A -> GYes 1654 T -> A No 1731 G -> T Yes 1864 G -> A Yes 1893 G -> T Yes 2073G -> A Yes 2269 C -> T Yes 2325 G -> T Yes 2465 C -> T Yes 2566 C -> TYes 2881 A -> G No

Variant protein HUMPHOSLIP_PEA_(—)2_P12 (SEQ ID NO:1328) according tothe present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMPHOSLIP_PEA_(—)2_T19(SEQ ID NO:57). An alignment is given to the known protein (Phospholipidtransfer protein precursor (SEQ ID NO:1433)) at the end of theapplication. One or more alignments to one or more previously publishedprotein sequences are given at the end of the application. A briefdescription of the relationship of the variant protein according to thepresent invention to each such aligned protein is as follows:

Comparison report between HUMPHOSLIP_PEA_(—)2_P12 (SEQ ID NO:1328) andPLTP_HUMAN (SEQ ID NO:1433):

1. An isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA_(—)2_P12(SEQ ID NO:1328), comprising a first amino acid sequence being at least90% homologous toMALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASVSRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLLDTVPVRSSVDELVGIDYSLMKDPVASTSNLDMDFRGAFFPLTERNWSLPNRAVEPQLQEEERMVYVAFSEFFFDSAMESYFRAGALQLLLVGDKVPHDLDMLLRATYFGSIVLLSPAVIDSPLKLELRVLAPPRCTIKPSGTTISVTASVTIALVPPDQPEVQLSSMTMDARLSAKMALRGKALRTQLDLRRFRIYSNHSALESLALIPLQAPLKTMLQIGVMPMLNcorresponding to amino acids 1-427 of PLTP_HUMAN (SEQ ID NO:1433), whichalso corresponds to amino acids 1-427 of HUMPHOSLIP_PEA_(—)2_P12 (SEQ IDNO:1328), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence GKAGV (SEQ ID NO: 263) corresponding to amino acids428-432 of HUMPHOSLIP_PEA_(—)2_P12 (SEQ ID NO:1328), wherein said firstamino acid sequence and second amino acid sequence are contiguous and ina sequential order.

2. An isolated polypeptide encoding for a tail ofHUMPHOSLIP_PEA_(—)2_P12 (SEQ ID NO:1328), comprising a polypeptide beingat least 70%, optionally at least about 80%, preferably at least about85%, more preferably at least about 90% and most preferably at leastabout 95% homologous to the sequence GKAGV (SEQ ID NO: 263) inHUMPHOSLIP_PEA_(—)2_P12 (SEQ ID NO:1328).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein HUMPHOSLIP_PEA_(—)2_P12 (SEQ ID NO:1328) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 450, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein HUMPHOSLIP_PEA_(—)2_P12 (SEQ ID NO:1328) sequenceprovides support for the deduced sequence of this variant proteinaccording to the present invention).

TABLE 450 Amino acid mutations SNP position(s) on amino acid sequenceAlternative amino acid(s) Previously known SNP? 16 H -> R Yes 18 E -> VYes 81 D -> H Yes 124 S -> Y Yes 160 T -> No 160 T -> N No 208 S -> FYes 213 V -> No 235 R -> P No 235 R -> No 245 N -> No 255 P -> No 296 P-> No 369 M -> No 380 R -> W Yes 387 Q -> No 410 L -> * No 425 M -> IYes

The glycosylation sites of variant protein HUMPHOSLIP_PEA_(—)2_P12 (SEQID NO:1328), as compared to the known protein Phospholipid transferprotein precursor (SEQ ID NO:1433), are described in Table 451 (givenaccording to their position(s) on the amino acid sequence in the firstcolumn; the second column indicates whether the glycosylation site ispresent in the variant protein; and the last column indicates whetherthe position is different on the variant protein).

TABLE 451 Glycosylation site(s) Position(s) on known Present Positionamino acid sequence in variant protein? in variant protein? 94 yes 94143 yes 143 64 yes 64 245 yes 245 398 yes 398 117 yes 117

Variant protein HUMPHOSLIP_PEA_(—)2_P12 (SEQ ID NO:1328) is encoded bythe following transcript(s): HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57), forwhich the sequence(s) is/are given at the end of the application. Thecoding portion of transcript HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57) isshown in bold; this coding portion starts at position 276 and ends atposition 1571. The transcript also has the following SNPs as listed inTable 452 (given according to their position on the nucleotide sequence,with the alternative nucleic acid listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein HUMPHOSLIP_PEA_(—)2_P12 (SEQ ID NO:1328) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 452 Nucleic acid SNPs SNP position(s) on nucleotide sequenceAlternative nucleic acid Previously known SNP? 174 G -> T No 175 A -> TNo 322 A -> G Yes 328 A -> T Yes 431 G -> A Yes 516 G -> C Yes 644 G ->A Yes 646 C -> A Yes 754 C -> No 754 C -> A No 836 C -> T Yes 898 C -> TYes 913 T -> No 979 G -> No 979 G -> C No 1008 A -> No 1038 C -> No 1161C -> No 1322 C -> T Yes 1382 G -> No 1413 C -> T Yes 1434 C -> No 1504 T-> A No 1515 C -> T Yes 1550 G -> C Yes 1690 T -> A Yes 1818 C -> A Yes1837 C -> T No 1917 C -> T Yes 1933 A -> G No 1933 A -> T No 1998 A -> GYes 2022 T -> A No 2099 G -> T Yes 2232 G -> A Yes 2261 G -> T Yes 2441G -> A Yes 2637 C -> T Yes 2693 G -> T Yes 2833 C -> T Yes 2934 C -> TYes 3249 A -> G No

Variant protein HUMPHOSLIP_PEA_(—)2_P30 (SEQ ID NO:1329) according tothe present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMPHOSLIP_PEA_(—)2_T6(SEQ ID NO:51). The location of the variant protein was determinedaccording to results from a number of different software programs andanalyses, including analyses from SignalP and other specializedprograms. The variant protein is believed to be located as follows withregard to the cell: secreted. The protein localization is believed to besecreted because both signal-peptide prediction programs predict thatthis protein has a signal peptide, and neither trans-membrane regionprediction program predicts that this protein has a trans-membraneregion.

Variant protein HUMPHOSLIP_PEA_(—)2_P30 (SEQ ID NO:1329) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 453, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein HUMPHOSLIP_PEA_(—)2_P30 (SEQ ID NO:1329) sequenceprovides support for the deduced sequence of this variant proteinaccording to the present invention).

TABLE 453 Amino acid mutations SNP position(s) on amino acid sequenceAlternative amino acid(s) Previously known SNP? 16 H -> R Yes 18 E -> VYes 37 R -> Q Yes

Variant protein HUMPHOSLIP_PEA_(—)2_P30 (SEQ ID NO:1329) is encoded bythe following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ ID NO:51), forwhich the sequence(s) is/are given at the end of the application. Thecoding portion of transcript HUMPHOSLIP_PEA_(—)2_T6 (SEQ ID NO:51) isshown in bold; this coding portion starts at position 276 and ends atposition 431. The transcript also has the following SNPs as listed inTable 454 (given according to their position on the nucleotide sequence,with the alternative nucleic acid listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein HUMPHOSLIP_PEA_(—)2_P30 (SEQ ID NO:1329) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 454 Nucleic acid SNPs SNP position on nucleotide sequenceAlternative nucleic acid Previously known SNP? 174 G -> T No 175 A -> TNo 322 A -> G Yes 328 A -> T Yes 385 G -> A Yes 470 G -> C Yes 598 G ->A Yes 600 C -> A Yes 708 C -> No 708 C -> A No 790 C -> T Yes 852 C -> TYes 867 T -> No 933 G -> No 933 G -> C No 962 A -> No 992 C -> No 1115 C-> No 1276 C -> T Yes 1336 G -> No 1367 C -> T Yes 1388 C -> No 1458 T-> A No 1469 C -> T Yes 1504 G -> C Yes 1561 T -> A Yes 1689 C -> A Yes1708 C -> T No 1788 C -> T Yes 1804 A -> G No 1804 A -> T No 1869 A -> GYes 1893 T -> A No 1970 G -> T Yes 2103 G -> A Yes 2132 G -> T Yes 2312G -> A Yes 2508 C -> T Yes 2564 G -> T Yes 2704 C -> T Yes 2805 C -> TYes 3120 A -> G No

Variant protein HUMPHOSLIP_PEA_(—)2_P31 (SEQ ID NO:1330) according tothe present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMPHOSLIP_PEA_(—)2_T7(SEQ ID NO:52). An alignment is given to the known protein (Phospholipidtransfer protein precursor (SEQ ID NO:1433)) at the end of theapplication. One or more alignments to one or more previously publishedprotein sequences are given at the end of the application. A briefdescription of the relationship of the variant protein according to thepresent invention to each such aligned protein is as follows:

Comparison report between HUMPHOSLIP_PEA_(—)2_P31 (SEQ ID NO:1330) andPLTP_HUMAN (SEQ ID NO:1433):

1. An isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA_(—)2_P31(SEQ ID NO:1330), comprising a first amino acid sequence being at least90% homologous toMALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGHFYYNISEcorresponding to amino acids 1-67 of PLTP_HUMAN (SEQ ID NO:1433), whichalso corresponds to amino acids 1-67 of HUMPHOSLIP_PEA_(—)2_P31 (SEQ IDNO:1330), and a second amino acid sequence 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequencePGLERGADKFPVVGGSSLFLALDLTLRPPVG (SEQ ID NO: 264) corresponding to aminoacids 68-98 of HUMPHOSLIP_PEA_(—)2_P31 (SEQ ID NO:1330), wherein saidfirst amino acid sequence and second amino acid sequence are contiguousand in a sequential order.

2. An isolated polypeptide encoding for a tail ofHUMPHOSLIP_PEA_(—)2_P31 (SEQ ID NO:1330), comprising a polypeptide beingat least 70%, optionally at least about 80%, preferably at least about85%, more preferably at least about 90% and most preferably at leastabout 95% homologous to the sequence PGLERGADKFPVVGGSSLFLALDLTLRPPVG(SEQ ID NO: 264) in HUMPHOSLIP_PEA_(—)2_P31 (SEQ ID NO:1330).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein HUMPHOSLIP_PEA_(—)2_P31 (SEQ ID NO:1330) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 455, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein HUMPHOSLIP_PEA_(—)2_P31 (SEQ ID NO:1330) sequenceprovides support for the deduced sequence of this variant proteinaccording to the present invention).

TABLE 455 Amino acid mutations SNP position(s) on amino acid sequenceAlternative amino acid(s) Previously known SNP? 16 H -> R Yes 18 E -> VYes

The glycosylation sites of variant protein HUMPHOSLIP_PEA_(—)2_P31 (SEQID NO:1330), as compared to the known protein Phospholipid transferprotein precursor (SEQ ID NO:1433), are described in Table 456 (givenaccording to their position(s) on the amino acid sequence in the firstcolumn; the second column indicates whether the glycosylation site ispresent in the variant protein; and the last column indicates whetherthe position is different on the variant protein).

TABLE 456 Glycosylation site(s) Position(s) on known Present Position inamino acid sequence in variant protein? variant protein? 94 no 143 no 64yes 64 245 no 398 no 117 no

Variant protein HUMPHOSLIP_PEA_(—)2_P31 (SEQ ID NO:1330) is encoded bythe following transcript(s): HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), forwhich the sequence(s) is/are given at the end of the application. Thecoding portion of transcript HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52) isshown in bold; this coding portion starts at position 276 and ends atposition 569. The transcript also has the following SNPs as listed inTable 457 (given according to their position on the nucleotide sequence,with the alternative nucleic acid listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein HUMPHOSLIP_PEA_(—)2_P31 (SEQ ID NO:1330) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 457 Nucleic acid SNPs SNP position nucleotide sequence Alternativenucleic acid Previously known SNP? 174 G -> T No 175 A -> T No 322 A ->G Yes 328 A -> T Yes 431 G -> A Yes 608 G -> C Yes 736 G -> A Yes 738 C-> A Yes 846 C -> No 846 C -> A No 928 C -> T Yes 990 C -> T Yes 1005 T-> No 1071 G -> No 1071 G -> C No 1100 A -> No 1130 C -> No 1253 C -> No1414 C -> T Yes 1474 G -> No 1505 C -> T Yes 1526 C -> No 1596 T -> A No1607 C -> T Yes 1642 G -> C Yes 1699 T -> A Yes 1827 C -> A Yes 1846 C-> T No 1926 C -> T Yes 1942 A -> G No 1942 A -> T No 2007 A -> G Yes2031 T -> A No 2108 G -> T Yes 2241 G -> A Yes 2270 G -> T Yes 2450 G ->A Yes 2646 C -> T Yes 2702 G -> T Yes 2842 C -> T Yes 2943 C -> T Yes3258 A -> G No

Variant protein HUMPHOSLIP_PEA_(—)2_P33 (SEQ ID NO:1331) according tothe present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53). An alignment is given to the known protein (Phospholipidtransfer protein precursor (SEQ ID NO:1433)) at the end of theapplication. One or more alignments to one or more previously publishedprotein sequences are given at the end of the application. A briefdescription of the relationship of the variant protein according to thepresent invention to each such aligned protein is as follows:

Comparison report between HUMPHOSLIP_PEA_(—)2_P33 (SEQ ID NO:1331) andPLTP_HUMAN (SEQ ID NO:1433):

1. An isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA_(—)2_P33(SEQ ID NO:1331), comprising a first amino acid sequence being at least90% homologous toMALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASVSRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQ corresponding to amino acids1-183 of PLTP_HUMAN (SEQ ID NO:1433), which also corresponds to aminoacids 1-183 of HUMPHOSLIP_PEA_(—)2_P33 (SEQ ID NO:1331), and a secondamino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceVWAATGRRVARVGMLSL (SEQ ID NO: 265) corresponding to amino acids 184-200of HUMPHOSLIP_PEA_(—)2_P33 (SEQ ID NO:1331), wherein said first aminoacid sequence and second amino acid sequence are contiguous and in asequential order.

2. An isolated polypeptide encoding for a tail ofHUMPHOSLIP_PEA_(—)2_P33 (SEQ ID NO:1331), comprising a polypeptide beingat least 70%, optionally at least about 80%, preferably at least about85%, more preferably at least about 90% and most preferably at leastabout 95% homologous to the sequence VWAATGRRVARVGMLSL (SEQ ID NO: 265)in HUMPHOSLIP_PEA_(—)2_P33 (SEQ ID NO:1331).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein HUMPHOSLIP_PEA_(—)2_P33 (SEQ ID NO:1331) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 458, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein HUMPHOSLIP_PEA_(—)2_P33 (SEQ ID NO:1331) sequenceprovides support for the deduced sequence of this variant proteinaccording to the present invention).

TABLE 458 Amino acid mutations SNP position(s) on amino acid sequenceAlternative amino acid(s) Previously known SNP? 16 H -> R Yes 18 E -> VYes 81 D -> H Yes 124 S -> Y Yes 160 T -> No 160 T -> N No

The glycosylation sites of variant protein HUMPHOSLIP_PEA_(—)2_P33 (SEQID NO:1331), as compared to the known protein Phospholipid transferprotein precursor (SEQ ID NO:1433), are described in Table 459 (givenaccording to their position(s) on the amino acid sequence in the firstcolumn; the second column indicates whether the glycosylation site ispresent in the variant protein; and the last column indicates whetherthe position is different on the variant protein).

TABLE 459 Glycosylation site(s) Position(s) on known Present in Positionin amino acid sequence variant protein? variant protein? 94 yes 94 143yes 143 64 yes 64 245 no 398 no 117 yes 117

Variant protein HUMPHOSLIP_PEA_(—)2_P33 (SEQ ID NO:1331) is encoded bythe following transcript(s): HUMPHOSLIP_PEA_(—)2_T14 (SEQ ID NO:53), forwhich the sequence(s) is/are given at the end of the application. Thecoding portion of transcript HUMPHOSLIP_PEA_(—)2_T14 (SEQ ID NO:53) isshown in bold; this coding portion starts at position 276 and ends atposition 875. The transcript also has the following SNPs as listed inTable 460 (given according to their position on the nucleotide sequence,with the alternative nucleic acid listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein HUMPHOSLIP_PEA_(—)2_P33 (SEQ ID NO:1331) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 460 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 174 G -> T No 175 A -> T No322 A -> G Yes 328 A -> T Yes 431 G -> A Yes 516 G -> C Yes 644 G -> AYes 646 C -> A Yes 754 C -> No 754 C -> A No 921 C -> T Yes 983 C -> TYes 998 T -> No 1064 G -> No 1064 G -> C No 1093 A -> No 1123 C -> No1246 C -> No 1407 C -> T Yes 1467 G -> No 1498 C -> T Yes 1519 C -> No1589 T -> A No 1600 C -> T Yes 1635 G -> C Yes 1692 T -> A Yes 1820 C ->A Yes 1839 C -> T No 1919 C -> T Yes 1935 A -> G No 1935 A -> T No 2000A -> G Yes 2024 T -> A No 2101 G -> T Yes 2234 G -> A Yes 2263 G -> TYes 2443 G -> A Yes 2639 C -> T Yes 2695 G -> T Yes 2835 C -> T Yes 2936C -> T Yes 3251 A -> G No

Variant protein HUMPHOSLIP_PEA_(—)2_P34 (SEQ ID NO:1332) according tothe present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMPHOSLIP_PEA_(—)2_T16(SEQ ID NO:54). An alignment is given to the known protein (Phospholipidtransfer protein precursor (SEQ ID NO:1433)) at the end of theapplication. One or more alignments to one or more previously publishedprotein sequences are given at the end of the application. A briefdescription of the relationship of the variant protein according to thepresent invention to each such aligned protein is as follows:

Comparison report between HUMPHOSLIP_PEA_(—)2_P34 (SEQ ID NO:1332) andPLTP_HUMAN (SEQ ID NO:1433):

1. An isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA_(—)2_P34(SEQ ID NO:1332), comprising a first amino acid sequence being at least90% homologous toMALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWFFYDGGYINASAEGVSIRTGLELSRDPAGRMKVSNVSCQASVSRMHAAFGGTFKKVYDFLSTFITSGMRFLLNQQICPVLYHAGTVLLNSLLDTVPVcorresponding to amino acids 1-205 of PLTP_HUMAN (SEQ ID NO:1433), whichalso corresponds to amino acids 1-205 of HUMPHOSLIP_PEA_(—)2_P34 (SEQ IDNO:1332), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence LWTSLLALTIPS (SEQ ID NO: 266) corresponding to aminoacids 206-217 of HUMPHOSLIP_PEA_(—)2_P34 (SEQ ID NO:1332), wherein saidfirst amino acid sequence and second amino acid sequence are contiguousand in a sequential order.

2. An isolated polypeptide encoding for a tail ofHUMPHOSLIP_PEA_(—)2_P34 (SEQ ID NO:1332), comprising a polypeptide beingat least 70%, optionally at least about 80%, preferably at least about85%, more preferably at least about 90% and most preferably at leastabout 95% homologous to the sequence LWTSLLALTIPS (SEQ ID NO: 266) inHUMPHOSLIP_PEA_(—)2_P34 (SEQ ID NO:1332).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein HUMPHOSLIP_PEA_(—)2_P34 (SEQ ID NO:1332) also has thefollowing non-silent SNPs

(Single Nucleotide Polymorphisms) as listed in Table 461, (givenaccording to their position(s) on the amino acid sequence, with thealternative amino acid(s) listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinHUMPHOSLIP_PEA_(—)2_P34 (SEQ ID NO:1332) sequence provides support forthe deduced sequence of this variant protein according to the presentinvention).

TABLE 461 Amino acid mutations SNP position(s) on amino acid sequenceAlternative amino acid(s) Previously known SNP? 16 H -> R Yes 18 E -> VYes 81 D -> H Yes 124 S -> Y Yes 160 T -> No 160 T -> N No 211 L -> No

The glycosylation sites of variant protein HUMPHOSLIP_PEA_(—)2_P34 (SEQID NO:1332), as compared to the known protein Phospholipid transferprotein precursor (SEQ ID NO:1433), are described in Table 462 (givenaccording to their position(s) on the amino acid sequence in the firstcolumn; the second column indicates whether the glycosylation site ispresent in the variant protein; and the last column indicates whetherthe position is different on the variant protein).

TABLE 462 Glycosylation site(s) Position(s) on known Present in Positionin amino acid sequence variant protein? variant protein? 94 yes 94 143yes 143 64 yes 64 245 no 398 no 117 yes 117

Variant protein HUMPHOSLIP_PEA_(—)2_P34 (SEQ ID NO:1332) is encoded bythe following transcript(s): HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54), forwhich the sequence(s) is/are given at the end of the application. Thecoding portion of transcript HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54) isshown in bold; this coding portion starts at position 276 and ends atposition 926. The transcript also has the following SNPs as listed inTable 463 (given according to their position on the nucleotide sequence,with the alternative nucleic acid listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein HUMPHOSLIP_PEA_(—)2_P34 (SEQ ID NO:1332) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 463 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 174 G -> T No 175 A -> T No322 A -> G Yes 328 A -> T Yes 431 G -> A Yes 516 G -> C Yes 644 G -> AYes 646 C -> A Yes 754 C -> No 754 C -> A No 836 C -> T Yes 891 C -> TYes 906 T -> No 972 G -> No 972 G -> C No 1001 A -> No 1031 C -> No 1154C -> No 1315 C -> T Yes 1375 G -> No 1406 C -> T Yes 1427 C -> No 1497 T-> A No 1508 C -> T Yes 1543 G -> C Yes 1600 T -> A Yes 1728 C -> A Yes1747 C -> T No 1827 C -> T Yes 1843 A -> G No 1843 A -> T No 1908 A -> GYes 1932 T -> A No 2009 G -> T Yes 2142 G -> A Yes 2171 G -> T Yes 2351G -> A Yes 2547 C -> T Yes 2603 G -> T Yes 2743 C -> T Yes 2844 C -> TYes 3159 A -> G No

Variant protein HUMPHOSLIP_PEA_(—)2_P35 (SEQ ID NO:1333) according tothe present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMPHOSLIP_PEA_(—)2_T18(SEQ ID NO:56). An alignment is given to the known protein (Phospholipidtransfer protein precursor (SEQ ID NO:1433)) at the end of theapplication. One or more alignments to one or more previously publishedprotein sequences are given at the end of the application. A briefdescription of the relationship of the variant protein according to thepresent invention to each such aligned protein is as follows:

Comparison report between HUMPHOSLIP_PEA_(—)2_P35 (SEQ ID NO:1333) andPLTP_HUMAN (SEQ ID NO:1433):

1. An isolated chimeric polypeptide encoding for HUMPHOSLIP_PEA_(—)2_P35(SEQ ID NO:1333), comprising a first amino acid sequence being at least90% homologous toMALFGALFLALLAGAHAEFPGCKIRVTSKALELVKQEGLRFLEQELETITIPDLRGKEGHFYYNISEVKVTELQLTSSELDFQPQQELMLQITNASLGLRFRRQLLYWF corresponding to amino acids 1-109of PLTP_HUMAN (SEQ ID NO:1433), which also corresponds to amino acids1-109 of HUMPHOSLIP_PEA_(—)2_P35 (SEQ ID NO:1333), a second amino acidsequence bridging amino acid sequence comprising of L, a third aminoacid sequence being at least 90% homologous to KVYDFLSTFITSGMRFLLNQQcorresponding to amino acids 163-183 of PLTP_HUMAN (SEQ ID NO:1433),which also corresponds to amino acids 111-131 of HUMPHOSLIP_PEA_(—)2_P35(SEQ ID NO:1333), and a fourth amino acid sequen least 70%, optionallyat least 80%, preferably at least 85%, more preferably at least 90% andmost preferably at least 95% homologous to a polypeptide having thesequence VWAATGRRVARVGMLSL (SEQ ID NO: 265) corresponding to amino acids132-148 of HUMPHOSLIP_PEA_(—)2_P35 (SEQ ID NO:1333), wherein said firstamino acid sequence, second amino acid sequence, third amino acidsequence and fourth amino acid sequence are contiguous and in asequential order.

2. An isolated polypeptide encoding for an edge portion ofHUMPHOSLIP_PEA_(—)2_P35 (SEQ ID NO:1333), comprising a polypeptidehaving a length “n”, wherein n is at least about 10 amino acids inlength, optionally at least about 20 amino acids in length, preferablyat least about 30 amino acids in length, more preferably at least about40 amino acids in length and most preferably at least about 50 aminoacids in length, wherein at least two amino acids comprise FLK having astructure as follows (numbering according to HUMPHOSLIP_PEA_(—)2_P35(SEQ ID NO:1333)): a sequence starting from any of amino acid numbers109−x to 109; and ending at any of amino acid numbers 111+((n−2)−x), inwhich x varies from 0 to n−2.

3. An isolated polypeptide encoding for a tail ofHUMPHOSLIP_PEA_(—)2_P35 (SEQ ID NO:1333), comprising a polypeptide beingat least 70%, optionally at least about 80%, preferably at least about85%, more preferably at least about 90% and most preferably at leastabout 95% homologous to the sequence VWAATGRRVARVGMLSL (SEQ ID NO: 265)in HUMPHOSLIP_PEA_(—)2_P35 (SEQ ID NO:1333).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein HUMPHOSLIP_PEA_(—)2_P35 (SEQ ID NO:1333) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 464, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein HUMPHOSLIP_PEA_(—)2_P35 (SEQ ID NO:1333) sequenceprovides support for the deduced sequence of this variant proteinaccording to the present invention).

TABLE 464 Amino acid mutations SNP position(s) on amino acid sequenceAlternative amino acid(s) Previously known SNP? 16 H -> R Yes 18 E -> VYes 81 D -> H Yes

The glycosylation sites of variant protein HUMPHOSLIP_PEA_(—)2_P35 (SEQID NO:1333), as compared to the known protein Phospholipid transferprotein precursor (SEQ ID NO:1433), are described in Table 465 (givenaccording to their position(s) on the amino acid sequence in the firstcolumn; the second column indicates whether the glycosylation site ispresent in the variant protein; and the last column indicates whetherthe position is different on the variant protein).

TABLE 465 Glycosylation site(s) Position(s) on known Present in Positionin amino acid sequence variant protein? variant protein? 94 yes 94 143no 64 yes 64 245 no 398 no 117 no

Variant protein HUMPHOSLIP_PEA_(—)2_P35 (SEQ ID NO:1333) is encoded bythe following transcript(s): HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56), forwhich the sequence(s) is/are given at the end of the application. Thecoding portion of transcript HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56) isshown in bold; this coding portion starts at position 276 and ends atposition 719. The transcript also has the following SNPs as listed inTable 466 (given according to their position on the nucleotide sequence,with the alternative nucleic acid listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein HUMPHOSLIP_PEA_(—)2_P35 (SEQ ID NO:1333) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 466 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 174 G -> T No 175 A -> T No322 A -> G Yes 328 A -> T Yes 431 G -> A Yes 516 G -> C Yes 765 C -> TYes 827 C -> T Yes 842 T -> No 908 G -> No 908 G -> C No 937 A -> No 967C -> No 1090 C -> No 1251 C -> T Yes 1311 G -> No 1342 C -> T Yes 1363 C-> No 1433 T -> A No 1444 C -> T Yes 1479 G -> C Yes 1536 T -> A Yes1664 C -> A Yes 1683 C -> T No 1763 C -> T Yes 1779 A -> G No 1779 A ->T No 1844 A -> G Yes 1868 T -> A No 1945 G -> T Yes 2078 G -> A Yes 2107G -> T Yes 2287 G -> A Yes 2483 C -> T Yes 2539 G -> T Yes 2679 C -> TYes 2780 C -> T Yes 3095 A -> G No

As noted above, cluster HUMPHOSLIP features 53 segment(s), which werelisted in Table 443 above and for which the sequence(s) are given at theend of the application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)0 (SEQ ID NO:518) accordingto the present invention is supported by 150 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 467 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 467 Segment location on transcripts Segment Segment startingending Transcript name position position HUMPHOSLIP_PEA_2_T6 (SEQ ID NO:51) 1 264 HUMPHOSLIP_PEA_2_T7 (SEQ ID NO: 52) 1 264 HUMPHOSLIP_PEA_2_T14(SEQ ID NO: 53) 1 264 HUMPHOSLIP_PEA_2_T16 (SEQ ID NO: 54) 1 264HUMPHOSLIP_PEA_2_T17 (SEQ ID NO: 55) 1 264 HUMPHOSLIP_PEA_2_T18 (SEQ IDNO: 56) 1 264 HUMPHOSLIP_PEA_2_T19 (SEQ ID NO: 57) 1 264

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)19 (SEQ ID NO:519) accordingto the present invention is supported by 186 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA2_T16 (SEQ ID NO:54) andHUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 468 below describes thestarting and ending position of this segment on each transcript.

TABLE 468 Segment location on transcripts Segment Segment startingending Transcript name position position HUMPHOSLIP_PEA_2_T6 (SEQ ID NO:51) 559 714 HUMPHOSLIP_PEA_2_T7 (SEQ ID NO: 52) 697 852HUMPHOSLIP_PEA_2_T14 (SEQ ID NO: 53) 605 760 HUMPHOSLIP_PEA_2_T16 (SEQID NO: 54) 605 760 HUMPHOSLIP_PEA_2_T19 (SEQ ID NO: 57) 605 760

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)34 (SEQ ID NO:520) accordingto the present invention is supported by 191 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 469 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 469 Segment location on transcripts Segment Segment startingending Transcript name position position HUMPHOSLIP_PEA_2_T6 (SEQ ID NO:51) 971 1111 HUMPHOSLIP_PEA_2_T7 (SEQ ID NO: 52) 1109 1249HUMPHOSLIP_PEA_2_T14 (SEQ ID NO: 53) 1102 1242 HUMPHOSLIP_PEA_2_T16 (SEQID NO: 54) 1010 1150 HUMPHOSLIP_PEA_2_T17 (SEQ ID NO: 55) 732 872HUMPHOSLIP_PEA_2_T18 (SEQ ID NO: 56) 946 1086 HUMPHOSLIP_PEA_2_T19 (SEQID NO: 57) 1017 1157

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)68 (SEQ ID NO:521) accordingto the present invention is supported by 131 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—2)_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 470 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 470 Segment location on transcripts Segment Segment startingending Transcript name position position HUMPHOSLIP_PEA_2_T6 (SEQ ID NO:51) 1867 2285 HUMPHOSLIP_PEA_2_T7 (SEQ ID NO: 52) 2005 2423HUMPHOSLIP_PEA_2_T14 (SEQ ID NO: 53) 1998 2416 HUMPHOSLIP_PEA_2_T16 (SEQID NO: 54) 1906 2324 HUMPHOSLIP_PEA_2_T17 (SEQ ID NO: 55) 1628 2046HUMPHOSLIP_PEA_2_T18 (SEQ ID NO: 56) 1842 2260 HUMPHOSLIP_PEA_2_T19 (SEQID NO: 57) 1996 2414

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)70 (SEQ ID NO:522) accordingto the present invention is supported by 5 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 471 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 471 Segment location on transcripts Segment Segment startingending Transcript name position position HUMPHOSLIP_PEA_2_T6 (SEQ ID NO:51) 2298 2529 HUMPHOSLIP_PEA_2_T7 (SEQ ID NO: 52) 2436 2667HUMPHOSLIP_PEA_2_T14 (SEQ ID NO: 53) 2429 2660 HUMPHOSLIP_PEA_2_T16 (SEQID NO: 54) 2337 2568 HUMPHOSLIP_PEA_2_T17 (SEQ ID NO: 55) 2059 2290HUMPHOSLIP_PEA_2_T18 (SEQ ID NO: 56) 2273 2504 HUMPHOSLIP_PEA_2_T19 (SEQID NO: 57) 2427 2658

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)75 (SEQ ID NO:523) accordingto the present invention is supported by 14 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 472 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 472 Segment location on transcripts Segment Segment startingending Transcript name position position HUMPHOSLIP_PEA_2_T6 (SEQ ID NO:51) 2846 3125 HUMPHOSLIP_PEA_2_T7 (SEQ ID NO: 52) 2984 3263HUMPHOSLIP_PEA_2_T14 (SEQ ID NO: 53) 2977 3256 HUMPHOSLIP_PEA_2_T16 (SEQID NO: 54) 2885 3164 HUMPHOSLIP_PEA_2_T17 (SEQ ID NO: 55) 2607 2886HUMPHOSLIP_PEA_2_T18 (SEQ ID NO: 56) 2821 3100 HUMPHOSLIP_PEA_2_T19 (SEQID NO: 57) 2975 3254

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)2 (SEQ ID NO:524) accordingto the present invention is supported by 159 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 473 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 473 Segment location on transcripts Segment Segment startingending Transcript name position position HUMPHOSLIP_PEA_2_T6 (SEQ ID NO:51) 265 337 HUMPHOSLIP_PEA_2_T7 (SEQ ID NO: 52) 265 337HUMPHOSLIP_PEA_2_T14 (SEQ ID NO: 53) 265 337 HUMPHOSLIP_PEA_2_T16 (SEQID NO: 54) 265 337 HUMPHOSLIP_PEA_2_T17 (SEQ ID NO: 55) 265 337HUMPHOSLIP_PEA_2_T18 (SEQ ID NO: 56) 265 337 HUMPHOSLIP_PEA_2_T19 (SEQID NO: 57) 265 337

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)3 (SEQ ID NO:525) accordingto the present invention can be found in the following transcript(s):HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14 (SEQ IDNO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54), HUMPHOSLIP_PEA_(—)2_T17(SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56) andHUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 474 below describes thestarting and ending position of this segment on each transcript.

TABLE 474 Segment location on transcripts Segment Segment startingending Transcript name position position HUMPHOSLIP_PEA_2_T7 (SEQ ID NO:52) 338 355 HUMPHOSLIP_PEA_2_T14 (SEQ ID NO: 53) 338 355HUMPHOSLIP_PEA_2_T16 (SEQ ID NO: 54) 338 355 HUMPHOSLIP_PEA_2_T17 (SEQID NO: 55) 338 355 HUMPHOSLIP_PEA_2_T18 (SEQ ID NO: 56) 338 355HUMPHOSLIP_PEA_2_T19 (SEQ ID NO: 57) 338 355

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)4 (SEQ ID NO:526) accordingto the present invention can be found in the following transcript(s):HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14 (SEQ IDNO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54), HUMPHOSLIP_PEA_(—)2_T17(SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56) andHUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 475 below describes thestarting and ending position of this segment on each transcript.

TABLE 475 Segment location on transcripts Segment Segment startingending Transcript name position position HUMPHOSLIP_PEA_2_T7 (SEQ ID NO:52) 356 375 HUMPHOSLIP_PEA_2_T14 (SEQ ID NO: 53) 356 375HUMPHOSLIP_PEA_2_T16 (SEQ ID NO: 54) 356 375 HUMPHOSLIP_PEA_2_T17 (SEQID NO: 55) 356 375 HUMPHOSLIP_PEA_2_T18 (SEQ ID NO: 56) 356 375HUMPHOSLIP_PEA_2_T19 (SEQ ID NO: 57) 356 375

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)6 (SEQ ID NO:527) accordingto the present invention can be found in the following transcript(s):HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP _PEA_(—)2_T14 (SEQ IDNO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54), HUMPHOSLIP_PEA_(—)2_T17(SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56) andHUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 476 below describes thestarting and ending position of this segment on each transcript.

TABLE 476 Segment location on transcripts Segment Segment startingending Transcript name position position HUMPHOSLIP_PEA_2_T7 (SEQ ID NO:52) 376 383 HUMPHOSLIP_PEA_2_T14 (SEQ ID NO: 53) 376 383HUMPHOSLIP_PEA_2_T16 (SEQ ID NO: 54) 376 383 HUMPHOSLIP_PEA_2_T17 (SEQID NO: 55) 376 383 HUMPHOSLIP_PEA_2_T18 (SEQ ID NO: 56) 376 383HUMPHOSLIP_PEA_2_T19 (SEQ ID NO: 57) 376 383

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)7 (SEQ ID NO:528) accordingto the present invention can be found in the following transcript(s):HUMPHOSLIP_PEA_(—)2_T6 (SEQ ID NO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ IDNO:52), HUMPHOSLIP_PEA_(—)2_T14 (SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16(SEQ ID NO:54), HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55),HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQID NO:57). Table 477 below describes the starting and ending position ofthis segment on each transcript.

TABLE 477 Segment location on transcripts Segment Segment startingending Transcript name position position HUMPHOSLIP_PEA_2_T6 (SEQ ID NO:51) 338 343 HUMPHOSLIP_PEA_2_T7 (SEQ ID NO: 52) 384 389HUMPHOSLIP_PEA_2_T14 (SEQ ID NO: 53) 384 389 HUMPHOSLIP_PEA_2_T16 (SEQID NO: 54) 384 389 HUMPHOSLIP_PEA_2_T17 (SEQ ID NO: 55) 384 389HUMPHOSLIP_PEA_2_T18 (SEQ ID NO: 56) 384 389 HUMPHOSLIP_PEA_2_T19 (SEQID NO: 57) 384 389

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)8 (SEQ ID NO:529) accordingto the present invention is supported by 171 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 478 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 478 Segment location on transcripts Segment Segment startingending Transcript name position position HUMPHOSLIP_PEA_2_T6 (SEQ ID NO:51) 344 378 HUMPHOSLIP_PEA_2_T7 (SEQ ID NO: 52) 390 424HUMPHOSLIP_PEA_2_T14 (SEQ ID NO: 53) 390 424 HUMPHOSLIP_PEA_2_T16 (SEQID NO: 54) 390 424 HUMPHOSLIP_PEA_2_T17 (SEQ ID NO: 55) 390 424HUMPHOSLIP_PEA_2_T18 (SEQ ID NO: 56) 390 424 HUMPHOSLIP_PEA_2_T19 (SEQID NO: 57) 390 424

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)9 (SEQ ID NO:530) accordingto the present invention is supported by 168 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ IDNO:56 and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 479 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 479 Segment location on transcripts Segment Segment startingending Transcript name position position HUMPHOSLIP_PEA_2_T6 (SEQ ID NO:51) 379 429 HUMPHOSLIP_PEA_2_T7 (SEQ ID NO: 52) 425 475HUMPHOSLIP_PEA_2_T14 (SEQ ID NO: 53) 425 475 HUMPHOSLIP_PEA_2_T16 (SEQID NO: 54) 425 475 HUMPHOSLIP_PEA_2_T17 (SEQ ID NO: 55) 425 475HUMPHOSLIP_PEA_2_T18 (SEQ ID NO: 56) 425 475 HUMPHOSLIP_PEA_2_T19 (SEQID NO: 57) 425 475

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)14 (SEQ ID NO:531) accordingto the present invention is supported by 6 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T7 (SEQ IDNO:52). Table 480 below describes the starting and ending position ofthis segment on each transcript.

TABLE 480 Segment location on transcripts Segment Segment startingending Transcript name position position HUMPHOSLIP_PEA_2_T7 (SEQ ID NO:52) 476 567

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)15 (SEQ ID NU:532) accordingto the present invention can be found in the following transcript(s):HUMPHOSLIP_PEA_(—)2_T6 (SEQ ID NO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ IDNO:52), HUMPHOSLIP_PEA_(—)2_T14 (SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16(SEQ ID NO:54), HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56) andHUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 481 below describes thestarting and ending position of this segment on each transcript.

TABLE 481 Segment location on transcripts Segment Segment startingending Transcript name position position HUMPHOSLIP_PEA_2_T6 (SEQ ID NO:51) 430 445 HUMPHOSLIP_PEA_2_T7 (SEQ ID NO: 52) 568 583HUMPHOSLIP_PEA_2_T14 (SEQ ID NO: 53) 476 491 HUMPHOSLIP_PEA_2_T16 (SEQID NO: 54) 476 491 HUMPHOSLIP_PEA_2_T18 (SEQ ID NO: 56) 476 491HUMPHOSLIP_PEA_2_T19 (SEQ ID NO: 57) 476 491

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)16 (SEQ ID NO:533) accordingto the present invention is supported by 179 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQID NO:57). Table 482 below describes the starting and ending position ofthis segment on each transcript.

TABLE 482 Segment location on transcripts Segment Segment startingending Transcript name position position HUMPHOSLIP_PEA_2_T6 (SEQ ID NO:51) 446 534 HUMPHOSLIP_PEA_2_T7 (SEQ ID NO: 52) 584 672HUMPHOSLIP_PEA_2_T14 (SEQ ID NO: 53) 492 580 HUMPHOSLIP_PEA_2_T16 (SEQID NO: 54) 492 580 HUMPHOSLIP_PEA_2_T18 (SEQ ID NO: 56) 492 580HUMPHOSLIP_PEA_2_T19 (SEQ ID NO: 57) 492 580

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)17 (SEQ ID NO:534) accordingto the present invention can be found in the following transcript(s):HUMPHOSLIP_PEA_(—)2_T6 (SEQ ID NO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ IDNO:52), HUMPHOSLIP_PEA_(—)2_T14 (SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16(SEQ ID NO:54), HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56) andHUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 483 below describes thestarting and ending position of this segment on each transcript.

TABLE 483 Segment location on transcripts Segment Segment startingending Transcript name position position HUMPHOSLIP_PEA_2_T6 (SEQ ID NO:51) 535 558 HUMPHOSLIP_PEA_2_T7 (SEQ ID NO: 52) 673 696HUMPHOSLIP_PEA_2_T14 (SEQ ID NO: 53) 581 604 HUMPHOSLIP_PEA_2_T16 (SEQID NO: 54) 581 604 HUMPHOSLIP_PEA_2_T18 (SEQ ID NO: 56) 581 604HUMPHOSLIP_PEA_2_T19 (SEQ ID NO: 57) 581 604

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)23 (SEQ ID NO:535) accordingto the present invention is supported by 168 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 484 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 484 Segment location on transcripts Segment Segment startingending Transcript name position position HUMPHOSLIP_PEA_2_T6 (SEQ ID NO:51) 715 766 HUMPHOSLIP_PEA_2_T7 (SEQ ID NO: 52) 853 904HUMPHOSLIP_PEA_2_T14 (SEQ ID NO: 53) 761 812 HUMPHOSLIP_PEA_2_T16 (SEQID NO: 54) 761 812 HUMPHOSLIP_PEA_2_T17 (SEQ ID NO: 55) 476 527HUMPHOSLIP_PEA_2_T18 (SEQ ID NO: 56) 605 656 HUMPHOSLIP_PEA_2_T19 (SEQID NO: 57) 761 812

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)24 (SEQ ID NO:536) accordingto the present invention can be found in the following transcript(s):HUMPHOSLIP_PEA_(—)2_T6 (SEQ ID NO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ IDNO:52), HUMPHOSLIP_PEA_(—)2_T14 (SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16(SEQ ID NO:54), HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55),HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQID NO:57). Table 485 below describes the starting and ending position ofthis segment on each transcript.

TABLE 485 Segment location on transcripts Segment Segment startingending Transcript name position position HUMPHOSLIP_PEA_2_T6 (SEQ ID NO:51) 767 778 HUMPHOSLIP_PEA_2_T7 (SEQ ID NO: 52) 905 916HUMPHOSLIP_PEA_2_T14 (SEQ ID NO: 53) 813 824 HUMPHOSLIP_PEA_2_T16 (SEQID NO: 54) 813 824 HUMPHOSLIP_PEA_2_T17 (SEQ ID NO: 55) 528 539HUMPHOSLIP_PEA_2_T18 (SEQ ID NO: 56) 657 668 HUMPHOSLIP_PEA_2_T19 (SEQID NO: 57) 813 824

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)25 (SEQ ID NO:537) accordingto the present invention is supported by 5 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T14 (SEQ IDNO:53) and HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56). Table 486 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 486 Segment location on transcripts Segment Segment startingending Transcript name position position HUMPHOSLIP_PEA_2_T14 (SEQ IDNO: 53) 825 909 HUMPHOSLIP_PEA_2_T18 (SEQ ID NO: 56) 669 753

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)26 (SEQ ID NO:538) accordingto the present invention is supported by 163 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 487 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 487 Segment location on transcripts Segment Segment startingending Transcript name position position HUMPHOSLIP_PEA_2_T6 (SEQ ID NO:51) 779 842 HUMPHOSLIP_PEA_2_T7 (SEQ ID NO: 52) 917 980HUMPHOSLIP_PEA_2_T14 (SEQ ID NO: 53) 910 973 HUMPHOSLIP_PEA_2_T16 (SEQID NO: 54) 825 888 HUMPHOSLIP_PEA_2_T17 (SEQ ID NO: 55) 540 603HUMPHOSLIP_PEA_2_T18 (SEQ ID NO: 56) 754 817 HUMPHOSLIP_PEA_2_T19 (SEQID NO: 57) 825 888

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)29 (SEQ ID NO:539) accordingto the present invention can be found in the following transcript(s):HUMPHOSLIP_PEA_(—)2_T6 (SEQ ID NO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ IDNO:52), HUMPHOSLIP_PEA_(—)2_T14 (SEQ ID NO:53) , HUMPHOSLIP_PEA_(—)2_T17(SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56) andHUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 488 below describes thestarting and ending position of this segment on each transcript.

TABLE 488 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 843 849 (SEQID NO: 51) HUMPHOSLIP_PEA_2_T7 981 987 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 974 980 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T17 604610 (SEQ ID NO: 55) HUMPHOSLIP_PEA_2_T18 818 824 (SEQ ID NO: 56)HUMPHOSLIP_PEA_2_T19 889 895 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)30 (SEQ ID NO:540) accordingto the present invention is supported by 181 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 489 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 489 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 850 934 (SEQID NO: 51) HUMPHOSLIP_PEA_2_T7 988 1072 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 981 1065 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 889973 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 611 695 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 825 909 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 896980 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)33 (SEQ ID NO:541) accordingto the present invention is supported by 173 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 490 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 490 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 935 970 (SEQID NO: 51) HUMPHOSLIP_PEA_2_T7 1073 1108 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 1066 1101 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 9741009 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 696 731 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 910 945 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 9811016 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)36 (SEQ ID NO:542) accordingto the present invention is supported by 163 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 491 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 491 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 1112 1156(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 1250 1294 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 1243 1287 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 11511195 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 873 917 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 1087 1131 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 11581202 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)37 (SEQ ID NO:543) accordingto the present invention can be found in the following transcript(s):HUMPHOSLIP_PEA_(—)2_T6 (SEQ ID NO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ IDNO:52), HUMPHOSLIP_PEA_(—)2_T14 (SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16(SEQ ID NO:54), HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55),HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQID NO:57). Table 492 below describes the starting and ending position ofthis segment on each transcript.

TABLE 492 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 1157 1171(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 1295 1309 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 1288 1302 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 11961210 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 918 932 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 1132 1146 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 12031217 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)39 (SEQ ID NO:544) accordingto the present invention is supported by 166 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 493 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 493 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 1172 1201(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 1310 1339 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 1303 1332 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 12111240 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 933 962 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 1147 1176 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 12181247 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)40 (SEQ ID NO:545) accordingto the present invention is supported by 199 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table. 494 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 494 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 1202 1288(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 1340 1426 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 1333 1419 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 12411327 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 963 1049 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 1177 1263 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 12481334 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)41 (SEQ ID NO:546) accordingto the present invention is supported by 186 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 495 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 495 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 1289 1318(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 1427 1456 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 1420 1449 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 13281357 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 1050 1079 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 1264 1293 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 13351364 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)42 (SEQ ID NO:547) accordingto the present invention can be found in the following transcript(s):HUMPHOSLIP_PEA_(—)2_T6 (SEQ ID NO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ IDNO:52), HUMPHOSLIP_PEA_(—)2_T14 (SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16(SEQ ID NO:54), HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55),HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQID NO:57). Table 496 below describes the starting and ending position ofthis segment on each transcript.

TABLE 496 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 1319 1336(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 1457 1474 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 1450 1467 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 13581375 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 1080 1097 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 1294 1311 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 13651382 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)44 (SEQ ID NO:548) accordingto the present invention is supported by 185 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 497 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 497 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 1337 1363(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 1475 1501 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 1468 1494 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 13761402 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 1098 1124 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 1312 1338 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 13831409 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)45 (SEQ ID NO:549) accordingto the present invention is supported by 197 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_(—T)7 (SEQ ID NO:52),HUMPHOSLIP_PEA_(—)2_T14 (SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ IDNO:54), HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18(SEQ ID NO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 498below describes the starting and ending position of this segment on eachtranscript.

TABLE 498 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 1364 1404(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 1502 1542 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 1495 1535 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 14031443 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 1125 1165 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 1339 1379 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 14101450 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)47 (SEQ ID NO:550) accordingto the present invention is supported by 223 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 499 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 499 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 1405 1447(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 1543 1585 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 1536 1578 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 14441486 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 1166 1208 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 1380 1422 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 14511493 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)51 (SEQ ID NO:551) accordingto the present invention can be found in the following transcript(s):HUMPHOSLIP_PEA_(—)2_T6 (SEQ ID NO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ IDNO:52), HUMPHOSLIP_PEA_(—)2_T14 (SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16(SEQ ID NO:54), HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55),HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56) and HUMPHOSLIP_PEA2_T19 (SEQ IDNO:57). Table 500 below describes the starting and ending position ofthis segment on each transcript.

TABLE 500 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 1448 1462(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 1586 1600 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 1579 1593 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 14871501 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 1209 1223 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 1423 1437 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 14941508 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)52 (SEQ ID NO:552) accordingto the present invention is supported by 235 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14 (SEQID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 501 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 501 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 1463 1511(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 1601 1649 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 1594 1642 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 15021550 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 1224 1272 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 1438 1486 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 15091557 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)53 (SEQ ID NO:553) accordingto the present invention is supported by 5 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T19 (SEQ IDNO:57). Table 502 below describes the starting and ending position ofthis segment on each transcript.

TABLE 502 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T19 1558 1640(SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)54 (SEQ ID NO:554) accordingto the present invention is supported by 236 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 503 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 503 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 1512 1552(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 1650 1690 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 1643 1683 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 15511591 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 1273 1313 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 1487 1527 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 16411681 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)55 (SEQ ID NO:555) accordingto the present invention is supported by 232 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 504 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 504 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 1553 1588(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 1691 1726 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 1684 1719 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 15921627 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 1314 1349 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 1528 1563 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 16821717 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)58 (SEQ ID NO:556) accordingto the present invention can be found in the following transcript(s):HUMPHOSLIP_PEA_(—)2_T6 (SEQ ID NO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ IDNO:52), HUMPHOSLIP_PEA_(—)2_T14 (SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16(SEQ ID NO:54), HUMPHOSLIP_PEA_(—)2T17 (SEQ ID NO:55),HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQID NO:57). Table 505 below describes the starting and ending position ofthis segment on each transcript.

TABLE 505 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 1589 1612(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 1727 1750 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 1720 1743 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 16281651 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 1350 1373 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 1564 1587 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 17181741 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)59 (SEQ ID NO:557) accordingto the present invention is supported by 230 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 506 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 506 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 1613 1648(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 1751 1786 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 1744 1779 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 16521687 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 1374 1409 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 1588 1623 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 17421777 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)60 (SEQ ID NO:558) accordingto the present invention can be found in the following transcript(s):HUMPHOSLIP_PEA_(—)2_T6 (SEQ ID NO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ IDNO:52), HUMPHOSLIP_PEA_(—)2_T14 (SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16(SEQ ID NO:54), HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55),HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQID NO:57). Table 507 below describes the starting and ending position ofthis segment on each transcript.

TABLE 507 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 1649 1671(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 1787 1809 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 1780 1802 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 16881710 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 1410 1432 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 1624 1646 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 17781800 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)61 (SEQ ID NO:559) accordingto the present invention can be found in the following transcript(s):HUMPHOSLIP_PEA_(—)2_T6 (SEQ ID NO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ IDNO:52), HUMPHOSLIP_PEA_(—)2_T14 (SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16(SEQ ID NO:54), HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55),HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQID NO:57). Table 508 below describes the starting and ending position ofthis segment on each transcript.

TABLE 508 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 1672 1680(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 1810 1818 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 1803 1811 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 17111719 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 1433 1441 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 1647 1655 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 18011809 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)62 (SEQ ID NO:560) accordingto the present invention can be found in the following transcript(s):HUMPHOSLIP_PEA_(—)2_T6 (SEQ ID NO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ IDNO:52), HUMPHOSLIP_PEA_(—)2_T14 (SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16(SEQ ID NO:54), HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55),HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQID NO:57). Table 509 below describes the starting and ending position ofthis segment on each transcript.

TABLE 509 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 1681 1703(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 1819 1841 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 1812 1834 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 17201742 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 1442 1464 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 1656 1678 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 18101832 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)63 (SEQ ID NO:561) accordingto the present invention can be found in the following transcript(s):HUMPHOSLIP_PEA_(—)2_T6 (SEQ ID NO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ IDNO:52), HUMPHOSLIP_PEA_(—)2_T14 (SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16(SEQ ID NO:54), HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55),HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQID NO:57). Table 510 below describes the starting and ending position ofthis segment on each transcript.

TABLE 510 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 1704 1727(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 1842 1865 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 1835 1858 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 17431766 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 1465 1488 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 1679 1702 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 18331856 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)64 (SEQ ID NO:562) accordingto the present invention can be found in the following transcript(s):HUMPHOSLIP_PEA_(—)2_T6 (SEQ ID NO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ IDNO:52), HUMPHOSLIP_PEA_(—)2_T14 (SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16(SEQ ID NO:54), HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55),HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQID NO:57). Table 511 below describes the starting and ending position ofthis segment on each transcript.

TABLE 511 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 1728 1734(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 1866 1872 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 1859 1865 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 17671773 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 1489 1495 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 1703 1709 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 18571863 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)65 (SEQ ID NO:563) accordingto the present invention can be found in the following transcript(s):HUMPHOSLIP_PEA_(—)2_T6 (SEQ ID NO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ IDNO:52), HUMPHOSLIP_PEA_(—)2_T14 (SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16(SEQ ID NO:54), HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55),HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQID NO:57). Table 512 below describes the starting and ending position ofthis segment on each transcript.

TABLE 512 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 1735 1754(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 1873 1892 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 1866 1885 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 17741793 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 1496 1515 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 1710 1729 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 18641883 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)66 (SEQ ID NU:564) accordingto the present invention is supported by 180 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 513 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 513 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 1755 1844(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 1893 1982 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 1886 1975 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 17941883 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 1516 1605 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 1730 1819 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 18841973 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)67 (SEQ ID NO:565) accordingto the present invention can be found in the following transcript(s):HUMPHOSLIP_PEA_(—)2_T6 (SEQ ID NO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ IDNO:52), HUMPHOSLIP_PEA_(—)2_T14 (SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16(SEQ ID NO:54), HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55),HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQID NO:57). Table 514 below describes the starting and ending position ofthis segment on each transcript.

TABLE 514 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 1845 1866(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 1983 2004 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 1976 1997 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 18841905 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 1606 1627 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 1820 1841 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 19741995 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)69 (SEQ ID NO:566) accordingto the present invention can be found in the following transcript(s):HUMPHOSLIP_PEA_(—)2_T6 (SEQ ID NO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ IDNO:52), HUMPHOSLIP_PEA_(—)2_T14 (SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16(SEQ ID NO:54), HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55),HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQID NO:57). Table 515 below describes the starting and ending position ofthis segment on each transcript.

TABLE 515 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 2286 2297(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 2424 2435 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 2417 2428 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 23252336 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 2047 2058 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 2261 2272 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 24152426 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)71 (SEQ ID NO:567) accordingto the present invention can be found in the following transcript(s):HUMPHOSLIP_PEA_(—)2_T6 (SEQ ID NO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ IDNO:52), HUMPHOSLIP_PEA_(—)2_T14 (SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16(SEQ ID NO:54), HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55),HUMPHOSLIP_PEA_(—)2_T18 (SEQ ID NO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQID NO:57). Table 516 below describes the starting and ending position ofthis segment on each transcript.

TABLE 516 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 2530 2542(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 2668 2680 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 2661 2673 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 25692581 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 2291 2303 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 2505 2517 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 26592671 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)72 (SEQ ID NO:568) accordingto the present invention is supported by 7 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 517 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 517 Segment location on transcripts Segment Segment Transcriptname starting position ending position HUMPHOSLIP_PEA_2_T6 2543 2647(SEQ ID NO: 51) HUMPHOSLIP_PEA_2_T7 2681 2785 (SEQ ID NO: 52)HUMPHOSLIP_PEA_2_T14 2674 2778 (SEQ ID NO: 53) HUMPHOSLIP_PEA_2_T16 25822686 (SEQ ID NO: 54) HUMPHOSLIP_PEA_2_T17 2304 2408 (SEQ ID NO: 55)HUMPHOSLIP_PEA_2_T18 2518 2622 (SEQ ID NO: 56) HUMPHOSLIP_PEA_2_T19 26722776 (SEQ ID NO: 57)

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)73 (SEQ ID NO:569) accordingto the present invention is supported by 5 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 518 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 518 Segment location on transcripts Segment Segment startingending Transcript name position position HUMPHOSLIP_PEA_2_T6 (SEQ ID NO:51) 2648 2755 HUMPHOSLIP_PEA_2_T7 (SEQ ID NO: 52) 2786 2893HUMPHOSLIP_PEA_2_T14 (SEQ ID NO: 53) 2779 2886 HUMPHOSLIP_PEA_2_T16 (SEQID NO: 54) 2687 2794 HUMPHOSLIP_PEA_2_T17 (SEQ ID NO: 55) 2409 2516HUMPHOSLIP_PEA_2_T18 (SEQ ID NO: 56) 2623 2730 HUMPHOSLIP_PEA_2_T19 (SEQID NO: 57) 2777 2884

Segment cluster HUMPHOSLIP_PEA_(—)2_node_(—)74 (SEQ ID NO:570) accordingto the present invention is supported by 10 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMPHOSLIP_PEA_(—)2_T6 (SEQ IDNO:51), HUMPHOSLIP_PEA_(—)2_T7 (SEQ ID NO:52), HUMPHOSLIP_PEA_(—)2_T14(SEQ ID NO:53), HUMPHOSLIP_PEA_(—)2_T16 (SEQ ID NO:54),HUMPHOSLIP_PEA_(—)2_T17 (SEQ ID NO:55), HUMPHOSLIP_PEA_(—)2_T18 (SEQ IDNO:56) and HUMPHOSLIP_PEA_(—)2_T19 (SEQ ID NO:57). Table 519 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 519 Segment location on transcripts Segment Segment startingending Transcript name position position HUMPHOSLIP_PEA_2_T6 (SEQ ID NO:51) 2756 2845 HUMPHOSLIP_PEA_2_T7 (SEQ ID NO: 52) 2894 2983HUMPHOSLIP_PEA_2_T14 (SEQ ID NO: 53) 2887 2976 HUMPHOSLIP_PEA_2_T16 (SEQID NO: 54) 2795 2884 HUMPHOSLIP_PEA_2_T17 (SEQ ID NO: 55) 2517 2606HUMPHOSLIP_PEA_2_T18 (SEQ ID NO: 56) 2731 2820 HUMPHOSLIP_PEA_2_T19 (SEQID NO: 57) 2885 2974Variant protein alignment to the previously known protein:

Sequence name: PLTP_HUMAN (SEQ ID NO: 1433) Sequence documentation:Alignment of: HUMPHOSLIP_PEA_2_P10 (SEQ ID NO: 1327) × PLTP_HUMAN (SEQID NO: 1433) . . . Alignment segment 1/1: Quality: 3716.00 Escore: 0Matching length: 398 Total length: 493 Matching Percent Similarity:100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 80.73Total Percent Identity: 80.73 Gaps: 1 Alignment:

Sequence name: PLTP_HUMAN (SEQ ID NO: 1433) Sequence documentation:Alignment of: HUMPHOSLIP_PEA_2_P12 (SEQ ID NO: 1328) × PLTP_HUMAN (SEQID NO: 1433) . . . Alignment segment 1/1: Quality: 4101.00 Escore: 0Matching length: 427 Total length: 427 Matching Percent Similarity:100.00 Matching Percent Identity: 100.00 Total Percent Similarity:100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:

Sequence name: PLTP_HUMAN (SEQ ID NO: 1433) Sequence documentation:Alignment of: HUMPHOSLIP_PEA_2_P31 (SEQ ID NO: 1330) × PLTP_HUMAN (SEQID NO : 1433) . . . Alignment segment 1/1: Quality: 639.00 Escore: 0Matching length: 67 Total length: 67 Matching Percent Similarity: 100.00Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 TotalPercent Identity: 100.00 Gaps: 0 Alignment:

Sequence name: PLTP_HUMAN (SEQ ID NO: 1433) Sequence documentation:Alignment of: HUMPHOSLIP_PEA_2_P33 (SEQ ID NO: 1331) × PLTP_HUMAN (SEQID NO: 1433) Alignment segment 1/1: Quality: 1767.00 Escore: 0 Matchinglength: 184 Total length: 184 Matching Percent Similarity: 100.00Matching Percent Identity: 99.46 Total Percent Similarity: 100.00 TotalPercent Identity: 99.46 Gaps: 0 Alignment:

Sequence name: PLTP_HUMAN (SEQ ID NO: 1433) Sequence documentation:Alignment of: HUMPHOSLIP_PEA_2_P34 (SEQ ID NO: 1332) × PLTP_HUMAN (SEQID NO: 1433) . . . Alignment segment 1/1: Quality: 1971.00 Escore: 0Matching length: 205 Total length: 205 Matching Percent Similarity:100.00 Matching Percent Identity: 100.00 Total Percent Similarity:100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:

Sequence name: PLTP_HUMAN (SEQ ID NO: 1433) Sequence documentation:Alignment of: HUMPHOSLIP_PEA_2_P35 (SEQ ID NO: 1333) × PLTP_HUMAN (SEQID NO: 1433) . . . Alignment segment 1/1: Quality: 1158.00 Escore: 0Matching length: 132 Total length: 184 Matching Percent Similarity:100.00 Matching Percent Identity: 98.48 Total Percent Similarity: 71.74Total Percent Identity: 70.65 Gaps: 1 Alignment:

Description for Cluster AI076020

Cluster AI076020 features 1 transcript(s) and 8 segment(s) of interest,the names for which are given in Tables 520 and 521, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 522.

TABLE 520 Transcripts of interest Transcript Name Sequence ID No.AI076020_T0 58

TABLE 521 Segments of interest Segment Name Sequence ID No.AI076020_node_0 571 AI076020_node_3 572 AI076020_node_8 573AI076020_node_1 574 AI076020_node_4 575 AI076020_node_5 576AI076020_node_6 577 AI076020_node_7 578

TABLE 522 Proteins of interest Protein Name Sequence ID No.Corresponding Transcript(s) AI076020_P1 1334 AI076020_T0 (SEQ ID NO: 58)

These sequences are variants of the known protein C1q-related factorprecursor (SwissProt accession identifier C1RF_HUMAN), SEQ ID NO: 1434,referred to herein as the previously known protein.

The sequence for protein C1q-related factor precursor (SEQ ID NO:1434)is given at the end of the application, as “C1q-related factor precursoramino acid sequence”.

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: locomotory behavior, which areannotation(s) related to Biological Process.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

Cluster AI076020 can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 31 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 31 and Table 523. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions:brain malignant tumors and a mixture of malignant tumors from differenttissues.

TABLE 523 Normal tissue distribution Name of Tissue Number bone 0 brain9 epithelial 0 general 4 kidney 2 lung 0 ovary 0 pancreas 30 uterus 0

TABLE 524 P values and ratios for expression in cancerous tissue Name ofTissue P1 P2 SP1 R3 SP2 R4 bone 3.3e−01 5.9e−02 4.0e−01 2.5 2.4e−01 3.0brain 8.8e−04 2.2e−03 5.5e−11 14.2 4.6e−08 8.7 epithelial 2.6e−018.6e−02 2.8e−01 2.4 1.8e−02 4.5 general 2.1e−03 3.0e−04 2.0e−06 4.38.4e−06 3.5 kidney 5.5e−01 3.3e−01 3.4e−01 2.3 8.2e−02 3.3 lung 16.3e−01 1 1.0 3.8e−01 2.2 ovary 4.2e−01 4.5e−01 0.0e+00 0.0 0.0e+00 0.0pancreas 6.0e−01 7.1e−01 8.9e−01 0.6 9.5e−01 0.5 uterus 1 4.0e−01 1 1.06.4e−01 1.5

As noted above, cluster AI076020 features 1 transcript(s), which werelisted in Table 520 above. These transcript(s) encode for protein(s)which are variant(s) of protein C1q-related factor precursor (SEQ IDNO:1434). A description of each variant protein according to the presentinvention is now provided.

Variant protein AI076020_P1 (SEQ ID NO:1334) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) AI076020_T0 (SEQ ID NO:58).The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein AI076020_P1 (SEQ ID NO:1334) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table525, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein AI076020_P1 (SEQ ID NO:1334) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 525 Amino acid mutations SNP position(s) Alternative Previously onamino acid amino known sequence acid(s) SNP? 36 P -> R Yes 66 Q -> R Yes165 K -> R Yes

Variant protein AI076020_P1 (SEQ ID NO:1334) is encoded by the followingtranscript(s): AI076020_T0 (SEQ ID NO:58), for which the sequence(s)is/are given at the end of the application. The coding portion oftranscript AI076020_T0 (SEQ ID NO:58) is shown in bold; this codingportion starts at position 261 and ends at position 1034. The transcriptalso has the following SNPs as listed in Table 526 (given according totheir position on the nucleotide sequence, with the alternative nucleicacid listed; the last column indicates whether the SNP is known or not;the presence of known SNPs in variant protein AI076020_P1 (SEQ IDNO:1334) sequence provides support for the deduced sequence of thisvariant protein according to the present invention).

TABLE 526 Nucleic acid SNPs SNP position Alternative Previously onnucleotide nucleic known sequence acid SNP? 367 C -> G Yes 457 A -> GYes 464 C -> A Yes 754 A -> G Yes 1265 C -> T Yes 1384 C -> T Yes 1402 G-> C Yes 1452 T -> C Yes

As noted above, cluster AI076020 features 8 segment(s), which werelisted in Table 521 above and for which the sequence(s) are given at theend of the application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster AI076020_node_(—)0 (SEQ ID NO:571) according to thepresent invention is supported by 28 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): AI076020_T0 (SEQ ID NO:58). Table 527 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 527 Segment location on transcripts Segment Segment startingending Transcript name position position AI076020_T0 (SEQ ID NO: 58) 1774

Microarray (chip) data is also available for this segment as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotides were found to hit this segment (in relation to lungcancer), shown in Table 528.

TABLE 528 Oligonucleotides related to this segment Oligonucleotide nameOverexpressed in cancers Chip reference AI076020_0_3_0 lung malignanttumors LUN (SEQ ID NO: 226)

Segment cluster AI076020_node_(—)3 (SEQ ID NO:572) according to thepresent invention is supported by 30 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): AI076020_T0 (SEQ ID NO:58). Table 529 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 529 Segment location on transcripts Segment Segment startingending Transcript name position position AI076020_T0 (SEQ ID NO: 58) 8581027

Segment cluster AI076020_node_(—)8 (SEQ ID NO:573) according to thepresent invention is supported by 35 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): AI076020_T0 (SEQ ID NO:58). Table 530 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 530 Segment location on transcripts Segment Segment startingending Transcript name position position AI076020_T0 (SEQ ID NO: 58)1359 1533

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster AI076020_node_(—)1 (SEQ ID NO:574) according to thepresent invention is supported by 19 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): AI076020_T0 (SEQ ID NO:58). Table 531 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 531 Segment location on transcripts Segment Segment startingending Transcript name position position AI076020_T0 (SEQ ID NO: 58) 775857

Segment cluster AI076020_node_(—)4 (SEQ ID NO:575) according to thepresent invention is supported by 28 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): AI076020_T0 (SEQ ID NO:58). Table 532 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 532 Segment location on transcripts Segment Segment startingending Transcript name position position AI076020_T0 (SEQ ID NO: 58)1028 1129

Segment cluster AI076020_node_(—)5 (SEQ ID NO:576) according to thepresent invention is supported by 31 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): A1076020 T0 (SEQ ID NO:58). Table 533 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 533 Segment location on transcripts Segment Segment startingending Transcript name position position AI076020_T0 (SEQ ID NO: 58)1130 1244

Segment cluster AI076020_node_(—)6 (SEQ ID NO:577) according to thepresent invention is supported by 32 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): AI076020_T0 (SEQ ID NO:58). Table 534 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 534 Segment location on transcripts Segment Segment startingending Transcript name position position AI076020_T0 (SEQ ID NO: 58)1245 1320

Segment cluster AI076020_node_(—)7 (SEQ ID NO:578) according to thepresent invention is supported by 33 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): AI076020_T0 (SEQ ID NO:58). Table 535 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 535 Segment location on transcripts Segment Segment startingending Transcript name position position AI076020_T0 (SEQ ID NO: 58)1321 1358

Description for Cluster T23580

Cluster T23580 features 1 transcript(s) and 5 segment(s) of interest,the names for which are given in Tables 536 and 537, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 538.

TABLE 536 Transcripts of interest Transcript Name Sequence ID No.T23580_T10 1626

TABLE 537 Segments of interest Segment Name Sequence ID No.T23580_node_17 579 T23580_node_18 580 T23580_node_21 581 T23580_node_19582 T23580_node_20 583

TABLE 538 Proteins of interest Protein Name Sequence ID No.Corresponding Transcript(s) T23580_P5 1335 T23580_T10 (SEQ ID NO: 1626)

These sequences are variants of the known protein Neuronal protein NP25(SwissProt accession identifier TAG3_HUMAN; known also according to thesynonyms Neuronal protein 22; NP22; Transgelin-3), SEQ ID NO: 1435,referred to herein as the previously known protein and also asNP25_HUMAN, which is the former SwissProt accession identifier.

The sequence for protein Neuronal protein NP25 (SEQ ID NO:1435) is givenat the end of the application, as “Neuronal protein NP25 amino acidsequence”.

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: central nervous systemdevelopment, which are annotation(s) related to Biological Process.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

For this cluster, at least one oligonucleotide was found to demonstrateoverexpression of the cluster, although not of at least onetranscript/segment as listed below. Microarray (chip) data is alsoavailable for this cluster as follows. Various oligonucleotides weretested for being differentially expressed in various disease conditions,particularly cancer, as previously described. The followingoligonucleotides were found to hit this cluster but not othersegments/transcripts below, shown in Table 539, with regard to lungcancer.

TABLE 539 Oligonucleotides related to this cluster Oligonucleotide nameOverexpressed in cancers Chip reference T23580_0_0_902 lung malignanttumors LUN (SEQ ID NO: 227)

As noted above, cluster T23580 features 1 transcript(s), which werelisted in Table 536 above. These transcript(s) encode for protein(s)which are variant(s) of protein Neuronal protein NP25 (SEQ ID NO:1435).A description of each variant protein according to the present inventionis now provided.

Variant protein T23580_P5 (SEQ ID NO:1335) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) T23580_T10 (SEQ ID NO:1626).The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseone of the two signal-peptide prediction programs (HMM:Signalpeptide,NN:NO) predicts that this protein has a signal peptide.

Variant protein T23580_P5 (SEQ ID NO:1335) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table540, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein T23580_P5 (SEQ ID NO:1335) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 540 Amino acid mutations SNP position(s) Alternative Previously onamino acid amino known sequence acid(s) SNP? 129 V -> I Yes

Variant protein T23580_P5 (SEQ ID NO:1335) is encoded by the followingtranscript(s): T23580_T10 (SEQ ID NO:1626), for which the sequence(s)is/are given at the end of the application. The coding portion oftranscript T23580_T10 (SEQ ID NO:1626) is shown in bold; this codingportion starts at position 1066 and ends at position 1485. Thetranscript also has the following SNPs as listed in Table 541 (givenaccording to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinT23580_P5 (SEQ ID NO:1335) sequence provides support for the deducedsequence of this variant protein according to the present invention).

TABLE 541 Nucleic acid SNPs SNP position Alternative Previously onnucleotide nucleic known sequence acid SNP? 37 A -> C Yes 320 G -> A Yes371 G -> T Yes 372 G -> A Yes 441 A -> G Yes 699 G -> C Yes 744 C -> GYes 862 G -> T Yes 1450 G -> A Yes

As noted above, cluster T23580 features 5 segment(s), which were listedin Table 537 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster T23580_node_(—)17 (SEQ ID NO:579) according to thepresent invention is supported by 10 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): T23580_T10 (SEQ ID NO:1626). Table 542 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 542 Segment location on transcripts Segment Segment startingending Transcript name position position T23580_T10 (SEQ ID NO: 1626) 11098

Segment cluster T23580_node_(—)18 (SEQ ID NO:580) according to thepresent invention is supported by 102 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): T23580_T10 (SEQ ID NO:1626). Table 543 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 543 Segment location on transcripts Segment Segment startingending Transcript name position position T23580_T10 (SEQ ID NO: 1626)1099 1357

Segment cluster T23580_node_(—)21 (SEQ ID NO:581) according to thepresent invention is supported by 79 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): T23580_T10 (SEQ ID NO:1626). Table 544 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 544 Segment location on transcripts Segment Segment startingending Transcript name position position T23580_T10 (SEQ ID NO: 1626)1382 1582

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster T23580_node_(—)19 (SEQ ID NO:582) according to thepresent invention can be found in the following transcript(s):T23580_T10 (SEQ ID NO:1626). Table 545 below describes the starting andending position of this segment on each transcript.

TABLE 545 Segment location on transcripts Segment Segment startingending Transcript name position position T23580_T10 (SEQ ID NO: 1626)1358 1370

Segment cluster T23580_node_(—)20 (SEQ ID NO:583) according to thepresent invention can be found in the following transcript(s):T23580_T10 (SEQ ID NO:1626). Table 546 below describes the starting andending position of this segment on each transcript.

TABLE 546 Segment location on transcripts Segment Segment startingending Transcript name position position T23580_T10 (SEQ ID NO: 1626)1371 1381

Description for Cluster M79217

Cluster M79217 features 6 transcript(s) and 32 segment(s) of interest,the names for which are given in Tables 547 and 548, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 549.

TABLE 547 Transcripts of interest Transcript Name Sequence ID No.M79217_PEA_1_T1 59 M79217_PEA_1_T3 60 M79217_PEA_1_T8 61M79217_PEA_1_T10 62 M79217_PEA_1_T15 63 M79217_PEA_1_T18 64

TABLE 548 Segments of interest Segment Name Sequence ID No.M79217_PEA_1_node_2 584 M79217_PEA_1_node_4 585 M79217_PEA_1_node_9 586M79217_PEA_1_node_10 587 M79217_PEA_1_node_11 588 M79217_PEA_1_node_13589 M79217_PEA_1_node_14 590 M79217_PEA_1_node_16 591M79217_PEA_1_node_23 592 M79217_PEA_1_node_24 593 M79217_PEA_1_node_31594 M79217_PEA_1_node_33 595 M79217_PEA_1_node_34 596M79217_PEA_1_node_35 597 M79217_PEA_1_node_37 598 M79217_PEA_1_node_38599 M79217_PEA_1_node_41 600 M79217_PEA_1_node_44 601M79217_PEA_1_node_0 602 M79217_PEA_1_node_7 603 M79217_PEA_1_node_12 604M79217_PEA_1_node_19 605 M79217_PEA_1_node_21 606 M79217_PEA_1_node_26607 M79217_PEA_1_node_27 608 M79217_PEA_1_node_30 609M79217_PEA_1_node_32 610 M79217_PEA_1_node_36 611 M79217_PEA_1_node_39612 M79217_PEA_1_node_40 613 M79217_PEA_1_node_42 614M79217_PEA_1_node_43 615

TABLE 549 Proteins of interest Sequence ID Protein Name No.Corresponding Transcript(s) M79217_PEA_1_P1 1336 M79217_PEA_1_T1 (SEQ IDNO: 59); M79217_PEA_1_T3 (SEQ ID NO: 60) M79217_PEA_1_P2 1337M79217_PEA_1_T8 (SEQ ID NO: 61) M79217_PEA_1_P4 1338 M79217_PEA_1_T10(SEQ ID NO: 62) M79217_PEA_1_P8 1339 M79217_PEA_1_T15 (SEQ ID NO: 63)M79217_PEA_1_P11 1340 M79217_PEA_1_T18 (SEQ ID NO: 64)

These sequences are variants of the known protein Exostosin-like 3(SwissProt accession identifier EXL3_HUMAN; known also according to thesynonyms EC 2.4.1.223; Glucuronyl-galactosyl-proteoglycan4-alpha-N-acetylglucosaminyltransferase; Putative tumor suppressorprotein EXTL3; Multiple exostosis-like protein 3; Hereditary multipleexostoses gene isolog; EXT-related protein 1), SEQ ID NO: 1436, referredto herein as the previously known protein.

Protein Exostosin-like 3 (SEQ ID NO:1436) is known or believed to havethe following function(s): Probable glycosyltransferase (By similarity).The sequence for protein Exostosin-like 3 is given at the end of theapplication, as “Exostosin-like 3 amino acid sequence”. ProteinExostosin-like 3 localization is believed to be Type II membraneprotein. Endoplasmic reticulum.

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: cell growth and/or maintenance,which are annotation(s) related to Biological Process; transferase,transferring glycosyl groups, which are annotation(s) related toMolecular Function; and endoplasmic reticulum; integral membraneprotein, which are annotation(s) related to Cellular Component.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

As noted above, cluster M79217 features 6 transcript(s), which werelisted in Table 547 above. These transcript(s) encode for protein(s)which are variant(s) of protein Exostosin-like 3 (SEQ ID NO:1436). A 20.description of each variant protein according to the present inventionis now provided.

Variant protein M79217_PEA_(—)1_P1 (SEQ ID NO:1336) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) M79217_PEA_(—)1_T1 (SEQ IDNO:59). An alignment is given to the known protein (Exostosin-like 3(SEQ ID NO:1436)) at the end of the application. One or more alignmentsto one or more previously published protein sequences are given at theend of the application. A brief description of the relationship of thevariant protein according to the present invention to each such alignedprotein is as follows:

Comparison report between M79217_PEA_(—)1_P1 (SEQ ID NO:1336) andBAA25445 (SEQ ID NO: 1437):

1. An isolated chimeric polypeptide encoding for M79217_PEA_(—)1_P1 (SEQID NO:1336), comprising a first amino acid sequence being at least 90%homologous toMTGYTMLRNGGAGNGGQTCMLRWSNRIRLTWLSFTLFVILVFFPLIAHYYLTTLDEADEAGKRIFGPRVGNELCEVKHVLDLCRIRESVSEELLQLEAKRQELNSEIAKLNLKIEACKKSIENAKQDLLQLKNVISQTEHSYKELMAQNQPKLSLPIRLLPEKDDAGLPPPKATRGCRLHNCFDYSRCPLTSGFPVYVYDSDQFVFGSYLDPLVKQAFQATARANVYVTENADIACLYVILVGEMQEPVVLRPAELEKQLYSLPHWRTDGHNHVIINLSRKSDTQNLLYNVSTGRAMVAQSTFYTVQYRPGFDLVVSPLVHAMSEPNFMEIPPQVPVKRKYLFTFQGEKIESLRSSLQEARSFEEEMEGDPPADYDDRIIATLKAVQDSKLDQVLVEFTCKNQPKPSLPTEWALCGEREDRLELLKLSTFALIITPGDPRLVISSGCATRLFEALEVGAVPVVLGEQVQLPYQDMLQWNEAALVVPKPRVTEVHFLLRSLSDSDLLAMRRQGRFLWETYFSTADSIFNTVLAMIRTRIQIPAAPIREEAAAEIPHRSGKAAGTDPNMADNGDLDLGPVETEPPYASPRYLRNFTLTVTDFYRSWNCAPGPFHLFPHTPFDPVLPSEAKFLGSGTGFRPIGGGAGGSGKEFQAALGGNVPREQFTVVMLTYEREEVLMNSLERLNGLPYLNKVVVVWNSPKLPSEDLLWPDIGVPIMVVRTEKNSLNNRFLPWNEIETEAILSIDDDAHLRHDEIMFGFRVWREARDRIVGFPGRYHAWDIPHQSWLYNSNYSCELSMVLTGAAFFHKYYAYLYSYVMPQAIRDMVDEYINCEDIAMNFLVSHITRKPPIKVTSRWTFRCPGCPQALSHDDSHFHERHKCINFFVKVYGYMPLLYTQFRVDSVLFKTRLPHDKTKCFKFIcorresponding to amino acids 13-931 of BAA25445 (SEQ ID NO:1437), whichalso corresponds to amino acids 1-919 of M79217_PEA_(—)1_P1 (SEQ IDNO:1336).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:membrane. The protein localization is believed to be membrane becausethe Signalp_hmm software predicts that this protein has a signal anchorregion.

Variant protein M79217_PEA_(—)1_P1 (SEQ ID NO:1336) is encoded by thefollowing transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript M79217_PEA_(—)1_T1 (SEQ ID NO:59) is shown inbold; this coding portion starts at position 1074 and ends at position3830. The transcript also has the following SNPs as listed in Table 550(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinM79217_PEA_(—)1_P1 (SEQ ID NO:1336) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 550 Nucleic acid SNPs SNP position on nucleotide sequenceAlternative nucleic acid Previously known SNP? 1014 C -> T No 1015 T ->No 1072 T -> C No 1232 T -> A No 1383 A -> G No 1440 A -> G No 1544 C ->No 1546 G -> A No 1685 T -> G No 2215 C -> No 2300 A -> G Yes 2483 T ->C No 2518 C -> No 2632 T -> G No 3190 T -> C Yes 3352 T -> C No 3373 G-> T No 3386 C -> No 3449 C -> T Yes 3618 A -> G No 3733 A -> G No 4021C -> No 4021 C -> T No 4086 G -> A No 4087 G -> A No 4416 T -> A No 4586G -> A Yes 4772 C -> T No 5110 C -> T Yes 5219 C -> T Yes 5437 G -> A No5645 G -> A No 5743 G -> A Yes 5887 G -> T Yes 6143 A -> C No 6277 G ->No 6277 G -> C No 6295 C -> G Yes 6308 T -> A No 6403 G -> A Yes 6442 G-> No 6495 C -> T No

Variant protein M79217_PEA_(—)1_P2 (SEQ ID NO:1337) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) M79217_PEA_(—)1_T8 (SEQ IDNO:61). An alignment is given to the known protein (Exostosin-like 3(SEQ ID NO:1436)) at the end of the application. One or more alignmentsto one or more previously published protein sequences are given at theend of the application. A brief description of the relationship of thevariant protein according to the present invention to each such alignedprotein is as follows:

Comparison report between M79217_PEA_(—)1_P2 (SEQ ID NO:1337) andEXL3_HUMAN (SEQ ID NO:1436):

1. An isolated chimeric polypeptide encoding for M79217_PEA_(—)1_P2 (SEQID NO:1337), comprising a first amino acid sequence being at least 90%homologous toMTGYTMLRNGGAGNGGQTCMLRWSNRIRLTWLSFTLFVILVFFPLIAHYYLTTLDEADEAGKRIFGPRVGNELCEVKHVLDLCRIRESVSEELLQLEAKRQELNSEIAKLNLKIEACKKSIENAKQDLLQLKNVISQTEHSYKELMAQNQPKLSLPIRLLPEKDDAGLPPPKATRGCRLHNCFDYSRCPLTSGFPVYVYDSDQFVFGSYLDPLVKQAFQATARANVYVTENADIACLYVILVGEMQEPVVLRPAELEKQLYSLPHWRTDGHNHVIINLSRKSDTQNLLYNVSTGRAMVAQSTFYTVQYRPGFDLVVSPLVHAMSEPNFMEIPPQVPVKRKYLFTFQGEKIESLRSSLQEARSFEEEMEGDPPADYDDRIIATLKAVQDSKLDQVLVEFTCKNQPKPSLPTEWALCGEREDRLELLKLSTFALIITPGDPRLVISSGCATRLFEALEVGAVPVVLGEQVQLPYQDMLQWNEAALVVPKPRVTEVHFLLRSLSDSDLLAMRRQGRFLWETYFSTADSIFNTVLAMIRTRIQIPAAPIREEAAAEIPHRSGKAAGTDPNMADNGDLDLGPVETEPPYASPRYLRNFTLTVTDFYRSWNCAPGPFHLFPHTPFDPVLPSEAKFLGSGTGFRPIGGGAGGSGKEFQAALGGNVPREQFTVVMLTYEREEVLMNSLERLNGLPYLNKVVVVWNSPKLPSEDLLWPDIGVPIMVVRTEKNSLNNRFLPWNEIETEAILSIDDDAHLRHDEIMFGFRVWREARDRIVGFPGRYHAWDIPHQSWLYNSNYSCELSMVLTGAAFFHK corresponding to amino acids 1-807 ofEXL3_HUMAN (SEQ ID NO:1436), which also corresponds to amino acids 1-807of M79217_PEA_(—)1_P2 (SEQ ID NO:1337), and a second amino acid sequencebeing at least 90% homologous toAIRDMVDEYINCEDIAMNFLVSHITRKPPIKVTSRWTFRCPGCPQALSHDDSHFHERHKCINFFVKVYGYMPLLYTQFRVDSVLFKTRLPHDKTKCFKFI corresponding to amino acids 820-919 ofEXL3_HUMAN (SEQ ID NO:1436), which also corresponds to amino acids808-907 of M79217_PEA_(—)1_P2 (SEQ ID NO:1337), wherein said first aminoacid sequence and second amino acid sequence are contiguous and in asequential order.

2. An isolated chimeric polypeptide encoding for an edge portion ofM79217_PEA_(—)1_P2 (SEQ ID NO:1337), comprising a polypeptide having alength “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise KA, having a structureas follows: a sequence starting from any of amino acid numbers 807−x to807; and ending at any of amino acid numbers 808+((n−2)−x), in which xvaries from 0 to n−2.

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:membrane. The protein localization is believed to be membrane becausethe Signalp_hmm software predicts that this protein has a signal anchorregion.

Variant protein M79217_PEA_(—)1_P2 (SEQ ID NO:1337) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 551, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein M79217_PEA_(—)1_P2 (SEQ ID NO:1337) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 551 Amino acid mutations SNP position(s) on amino acid sequenceAlternative amino acid(s) Previously known SNP? 104 N -> D No 123 N -> DNo 157 I -> No 158 R -> Q No 204 F -> L No 381 A -> No 482 A -> No 520 F-> C No 706 L -> P Yes 760 V -> A No 767 R -> L No 771 F -> No 837 I ->V No 875 Y -> C No

The glycosylation sites of variant protein M79217_PEA_(—)1_P2 (SEQ IDNO:1337), as compared to the known protein Exostosin-like 3 (SEQ IDNO:1436), are described in Table 552 (given according to theirposition(s) on the amino acid sequence in the first column; the secondcolumn indicates whether the glycosylation site is present in thevariant protein; and the last column indicates whether the position isdifferent on the variant protein).

TABLE 552 Glycosylation site(s) Position(s) on known amino acid Presentin sequence variant protein? Position in variant protein? 290 yes 290592 yes 592 790 yes 790 277 yes 277

Variant protein M79217_PEA_(—)1_P2 (SEQ ID NO:1337) is encoded by thefollowing transcript(s): M79217_PEA_(—)1_T8 (SEQ ID NO:61), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript M79217_PEA_(—)1_T8 (SEQ ID NO:61) is shown inbold; this coding portion starts at position 748 and ends at position3468. The transcript also has the following SNPs as listed in Table 553(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinM79217_PEA_(—)1_P2 (SEQ ID NO:1337) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 553 Nucleic acid SNPs SNP position on nucleotide sequenceAlternative nucleic acid Previously known SNP? 688 C -> T No 689 T -> No746 T -> C No 906 T -> A No 1057 A -> G No 1114 A -> G No 1218 C -> No1220 G -> A No 1359 T -> G No 1889 C -> No 1974 A -> G Yes 2157 T -> CNo 2192 C -> No 2306 T -> G No 2864 T -> C Yes 3026 T -> C No 3047 G ->T No 3060 C -> No 3123 C -> T Yes 3256 A -> G No 3371 A -> G No 3659 C-> No 3659 C -> T No 3724 G -> A No 3725 G -> A No 4054 T -> A No 4224 G-> A Yes 4410 C -> T No 4748 C -> T Yes 4857 C -> T Yes 5075 G -> A No5283 G -> A No 5381 G -> A Yes 5525 G -> T Yes 5781 A -> C No 5915 G ->No 5915 G -> C No 5933 C -> G Yes 5946 T -> A No 6041 G -> A Yes 6080 G-> No 6133 C -> T No

Variant protein M79217_PEA_(—)1_P4 (SEQ ID NO:1338) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) M79217_PEA_(—)1_T10 (SEQ IDNO:62). An alignment is given to the known protein (Exostosin-like 3(SEQ ID NO:1436)) at the end of the application. One or more alignmentsto one or more previously published protein sequences are given at theend of the application. A brief description of the relationship of thevariant protein according to the present invention to each such alignedprotein is as follows:

Comparison report between M79217_PEA_(—)1_P4 (SEQ ID NO:1338) andEXL3_HUMAN (SEQ ID NO:1436):

1. An isolated chimeric polypeptide encoding for M79217_PEA_(—)1_P4 (SEQID NO:1338), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence PELRQPARLGLPECWDYRHEPRCPAQMGSHFIVQAGLKLLASSKPPKCWDY(SEQ ID NO: 1724) corresponding to amino acids 1-51 ofM79217_PEA_(—)1_P4 (SEQ ID NO:1338), and a second amino acid sequencebeing at least 90% homologous toRVWREARDRIVGFPGRYHAWDIPHQSWLYNSNYSCELSMVLTGAAFFHKYYAYLYSYVMPQAIRDMVDEYINCEDIAMNFLVSHITRKPPIKVTSRWTFRCPGCPQALSHDDSHFHERHKCINFFVKVYGYMPLLYTQFRVDSVLFKTRLPHDKTKCFKFI corresponding to amino acids 759-919 of EXL3_HUMAN(SEQ ID NO:1436), which also corresponds to amino acids 52-212 ofM79217_PEA_(—)1_P4 (SEQ ID NO:1338), wherein said first amino acidsequence and second amino acid sequence are contiguous and in asequential order.

2. An isolated polypeptide encoding for a head of M79217_PEA_(—)1_P4(SEQ ID NO:1338), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequencePELRQPARLGLPECWDYRHEPRCPAQMGSHFIVQAGLKLLASSKPPKCWDY (Seq id no: 1724) ofM79217PEA_(—)1_P4 (SEQ ID NO:1338).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:membrane. The protein localization is believed to be membrane becausealthough it is a partial protein, because both trans-membrane regionprediction programs predict that this protein has a trans-membraneregion.

Variant protein M79217_PEA_(—)1_P4 (SEQ ID NO:1338) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 554, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein M79217_PEA_(—)1_P4 (SEQ ID NO:1338) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 554 Amino acid mutations SNP position(s) on amino acid sequenceAlternative amino acid(s) Previously known SNP? 53 V -> A No 60 R -> LNo 64 F -> No 142 I -> V No 180 Y -> C No

The glycosylation sites of variant protein M79217_PEA_(—)1_P4 (SEQ IDNO:1338), as compared to the known protein Exostosin-like 3 (SEQ IDNO:1436), are described in Table 555 (given according to theirposition(s) on the amino acid sequence in the first column; the secondcolumn indicates whether the glycosylation site is present in thevariant protein; and the last column indicates whether the position isdifferent on the variant protein).

TABLE 555 Glycosylation site(s) Position(s) on known amino acid Presentin sequence variant protein? Position in variant protein? 290 no 592 no790 yes 83 277 no

Variant protein M79217_PEA_(—)1_P4 (SEQ ID NO:1338) is encoded by thefollowing transcript(s): M79217_PEA_(—)1_T10 (SEQ ID NO:62), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript M79217_PEA_(—)1_T10 (SEQ ID NO:62) is shown inbold; this coding portion starts at position 1 and ends at position 637.The transcript also has the following SNPs as listed in Table 556 (givenaccording to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinM79217_PEA_(—)1_P4 (SEQ ID NO:1338) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 556 Nucleic acid SNPs SNP position on nucleotide sequenceAlternative nucleic acid Previously known SNP? 159 T -> C No 180 G -> TNo 193 C -> No 256 C -> T Yes 425 A -> G No 540 A -> G No 828 C -> No828 C -> T No 893 G -> A No 894 G -> A No 1223 T -> A No 1393 G -> A Yes1579 C -> T No 1917 C -> T Yes 2026 C -> T Yes 2244 G -> A No 2452 G ->A No 2550 G -> A Yes 2694 G -> T Yes 2950 A -> C No 3084 G -> No 3084 G-> C No 3102 C -> G Yes 3115 T -> A No 3210 G -> A Yes 3249 G -> No 3302C -> T No

Variant protein M79217_PEA_(—)1_P8 (SEQ ID NO:1339) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) M79217_PEA_(—)1_T15 (SEQ IDNO:63). An alignment is given to the known protein (Exostosin-like 3(SEQ ID NO:1436)) at the end of the application. One or more alignmentsto one or more previously published protein sequences are given at theend of the application. A brief description of the relationship of thevariant protein according to the present invention to each such alignedprotein is as follows:

Comparison report between M79217_PEA_(—)1_P8 (SEQ ID NO:1339) andEXL3_HUMAN (SEQ ID NO:1436):

1. An isolated chimeric polypeptide encoding for M79217_PEA_(—)1_P8 (SEQID NO:1339), comprising a first amino acid sequence being at least 90%homologous toMTGYTMLRNGGAGNGGQTCMLRWSNRIRLTWLSFTLFVILVFFPLIAHYYLTTLDEADEAGKRIFGPRVGNELCEVKHVLDLCRIRESVSEELLQLEAKRQELNSEIAKLNLKIEACKKSIENAKQDLLQLKNVISQTEHSYKELMAQNQPKLSLPIRLLPEKDDAGLPPPKATRGCRLHNCFDYSRCPLTSGFPVYVYDSDQFVFGSYLDPLVKQAFQATARANVYVTENADIACLYVILVGEMQEPVVLRPAELEKQLYSLPHWRTDGHNHVIINLSRKSDTQNLLYNVSTGRAMVAQSTFYTVQYRPGFDLVVSPLVHAMSEPNFMEIPPQVPVKRKYLFTFQGEKIESLRSSLQEARSFEEEMEGDPPADYDDRIIATLKAVQDSKLDQVLVEFTCKNQPKPSLPTEWALCGEREDRLELLKLSTFALIITPGDPRLVISSGCATRLFEALEVGAVPVVLGEQVQLPYQDMLQWNEAALVVPKPRVTEVHFLLRSLSDSDLLAMRRQGRFLWETYFSTADSIFNTVLAMIRTRIQIPAAPIREEAAAEIPHRSGKAAGTDPNMADNGDLDLGPVETEPPYASPRYLRNFTLTVTDFYRSWNCAPGPFHLFPHTPFDPVLPSEAKFLGSGTGFRPIGGGAGGSGKEFQAALGGNVPREQFTVVMLTYEREEVLMNSLERLNGLPYLNKVVVVWNSPKLPSEDLLWPDIGVPIMVVRTEKNSLNNRFLPWNEIETEAILSIDDDAHLRHDEIMFGFRVWREARDRIVGFPGRYHAWDIPHQSWLYNSNYSCELSMVLTGAAFFHK corresponding to amino acids 1-807 ofEXL3_HUMAN (SEQ ID NO:1436), which also corresponds to amino acids 1-807of M79217_PEA_(—)1_P8 (SEQ ID NO:1339), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence VRKSW (SEQ ID NO: 1725)corresponding to amino acids 808-812 of M79217_PEA_(—)1_P8 (SEQ IDNO:1339), wherein said first amino acid sequence and second amino acidsequence are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of M79217_PEA_(—)1_P8(SEQ ID NO:1339), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence VRKSW (SEQ ID NO: 1725) in M79217PEA_(—)1_P8(SEQ ID NO:1339).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:membrane. The protein localization is believed to be membrane becausethe Signalp_hmm software predicts that this protein has a signal anchorregion.

Variant protein M79217_PEA_(—)1_P8 (SEQ ID NO:1339) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 557, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein M79217_PEA_(—)1_P8 (SEQ ID NO:1339) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 557 Amino acid mutations SNP position(s) on amino acid sequenceAlternative amino acid(s) Previously known SNP? 104 N -> D No 123 N -> DNo 157 I -> No 158 R -> Q No 204 F -> L No 381 A -> No 482 A -> No 520 F-> C No 706 L -> P Yes 760 V -> A No 767 R -> L No 771 F -> No

The glycosylation sites of variant protein M79217_PEA_(—)1_P8 (SEQ IDNO:1339), as compared to the known protein Exostosin-like 3 (SEQ IDNO:1436), are described in Table 558 (given according to theirposition(s) on the amino acid sequence in the first column; the secondcolumn indicates whether the glycosylation site is present in thevariant protein; and the last column indicates whether the position isdifferent on the variant protein).

TABLE 558 Glycosylation site(s) Position(s) on known amino acid Presentin sequence variant protein? Position in variant protein? 290 yes 290592 yes 592 790 yes 790 277 yes 277

Variant protein M79217_PEA_(—)1_P8 (SEQ ID NO:1339) is encoded by thefollowing transcript(s): M79217_PEA_(—)1_T15 (SEQ ED NO:63), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript M79217_PEA_(—)1_T15 (SEQ ID NO:63) is shown inbold; this coding portion starts at position 748 and ends at position3183. The transcript also has the following SNPs as listed in Table 559(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinM79217_PEA_(—)1_P8 (SEQ ID NO:1339) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 559 Nucleic acid SNPs SNP position on nucleotide sequenceAlternative nucleic acid Previously known SNP? 688 C -> T No 689 T -> No746 T -> C No 906 T -> A No 1057 A -> G No 1114 A -> G No 1218 C -> No1220 G -> A No 1359 T -> G No 1889 C -> No 1974 A -> G Yes 2157 T -> CNo 2192 C -> No 2306 T -> G No 2864 T -> C Yes 3026 T -> C No 3047 G ->T No 3060 C -> No 3123 C -> T Yes 3391 C -> T No 3560 T -> C No

Variant protein M79217_PEA_(—)1_P11 (SEQ ID NO:1340) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) M79217_PEA_(—)1_T18 (SEQ IDNO:64). The location of the variant protein was determined according toresults from a number of different software programs and analyses,including analyses from SignalP and other specialized programs. Thevariant protein is believed to be located as follows with regard to thecell: secreted. The protein localization is believed to be secretedbecause one of the two signal-peptide prediction programs (HMM:Signalpeptide,NN:NO) predicts that this protein has a signal peptide.

Variant protein M79217_PEA_(—)1_P11 (SEQ ID NO:1340) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 560, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein M79217_PEA_(—)1_P11 (SEQ ID NO:1340) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 560 Amino acid mutations SNP position(s) on amino acid sequenceAlternative amino acid(s) Previously known SNP? 17 P -> No 28 C -> S No72 V -> No 90 S -> F No

Variant protein M79217_PEA_(—)1_P11 (SEQ ID NO:1340) is encoded by thefollowing transcript(s): M79217_PEA_(—)1_T18 (SEQ ID NO:64), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript M79217_PEA_(—)1_T18 (SEQ ID NO:64) is shown inbold; this coding portion starts at position 1354 and ends at position1674. The transcript also has the following SNPs as listed in Table 561(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinM79217_PEA_(—)1_P11 (SEQ ID NO:1340) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 561 Nucleic acid SNPs SNP position on nucleotide sequenceAlternative nucleic acid Previously known SNP? 688 C -> T No 689 T -> No746 T -> C No 772 G -> A No 870 G -> A Yes 1014 G -> T Yes 1270 A -> CNo 1404 G -> No 1404 G -> C No 1422 C -> G Yes 1435 T -> A No 1530 G ->A Yes 1569 G -> No 1622 C -> T No

As noted above, cluster M79217 features 32 segment(s), which were listedin Table 548 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster M79217_PEA_(—)1_node_(—)2 (SEQ ID NO:584) according tothe present invention is supported by 2 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T3 (SEQ ID NO:60).Table 562 below describes the starting and ending position of thissegment on each transcript.

TABLE 562 Segment location on transcripts Segment Segment Transcriptname starting position ending position M79217_PEA_1_T3 50 177 (SEQ IDNO: 60)

Segment cluster M79217_PEA_(—)1_node_(—)4 (SEQ ID NO:585) according tothe present invention is supported by 8 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T8 (SEQ ID NO:61),M79217_PEA_(—)1_T15 (SEQ ID NO:63) and M79217_PEA_(—)1_T18 (SEQ IDNO:64). Table 563 below describes the starting and ending position ofthis segment on each transcript.

TABLE 563 Segment location on transcripts Segment starting SegmentTranscript name position ending position M79217_PEA_1_T8 (SEQ ID NO: 61)1 177 M79217_PEA_1_T15 (SEQ ID NO: 63) 1 177 M79217_PEA_1_T18 (SEQ IDNO: 64) 1 177

Segment cluster M79217_PEA_(—)1_node_(—)9 (SEQ ID NO:586) according tothe present invention is supported by 2 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59).Table 564 below describes the starting and ending position of thissegment on each transcript.

TABLE 564 Segment location on transcripts Segment starting SegmentTranscript name position ending position M79217_PEA_1_T1 (SEQ ID NO: 59)1 597

Segment cluster M79217_PEA_(—)1_node_(—)10 (SEQ ID NO:587) according tothe present invention is supported by 33 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59),M79217_PEA_(—)1_T3 (SEQ ID NO:60), M79217_PEA_(—)1_T8 (SEQ ID NO:61),M79217_PEA_(—)1_T15 (SEQ ID NO:63) and M79217_PEA_(—)1_T18 (SEQ IDNO:64). Table 565 below describes the starting and ending position ofthis segment on each transcript.

TABLE 565 Segment location on transcripts Segment starting SegmentTranscript name position ending position M79217_PEA_1_T1 (SEQ ID NO: 59)598 1080 M79217_PEA_1_T3 (SEQ ID NO: 60) 272 754 M79217_PEA_1_T8 (SEQ IDNO: 61) 272 754 M79217_PEA_1_T15 (SEQ ID NO: 63) 272 754M79217_PEA_1_T18 (SEQ ID NO: 64) 272 754

Microarray (chip) data is also available for this segment as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotides were found to hit this segment (in relation to lungcancer), shown in Table 566.

TABLE 566 Oligonucleotides related to this segment Oligonucleotide nameOverexpressed in cancers Chip reference M79217_0_9_0 lung malignanttumors LUN (SEQ ID NO: 229)

Segment cluster M79217_PEA_(—)1_node_(—)11 (SEQ ID NO:588) according tothe present invention is supported by 42 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59),M79217_PEA_(—)1_T3 (SEQ ID NO:60), M79217_PEA_(—)1_T8 (SEQ ID NO:61) andM79217_PEA_(—)1_T15 (SEQ ID NO:63). Table 567 below describes thestarting and ending position of this segment on each transcript.

TABLE 567 Segment location on transcripts Segment starting SegmentTranscript name position ending position M79217_PEA_1_T1 (SEQ ID NO: 59)1081 1523 M79217_PEA_1_T3 (SEQ ID NO: 60) 755 1197 M79217_PEA_1_T8 (SEQID NO: 61) 755 1197 M79217_PEA_1_T15 (SEQ ID NO: 63) 755 1197

Segment cluster M79217_PEA_(—)1_node_(—)13 (SEQ ID NO:589) according tothe present invention is supported by 35 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59),M79217_PEA_(—)1_T3 (SEQ ID NO:60), M79217PEA_(—)1_T8 (SEQ ID NO:61) andM79217_PEA_(—)1T15 (SEQ ID NO:63). Table 568 below describes thestarting and ending position of this segment on each transcript.

TABLE 568 Segment location on transcripts Segment starting SegmentTranscript name position ending position M79217_PEA_1_T1 (SEQ ID NO: 59)1548 2075 M79217_PEA_1_T3 (SEQ ID NO: 60) 1222 1749 M79217_PEA_1_T8 (SEQID NO: 61) 1222 1749 M79217_PEA_1_T15 (SEQ ID NO: 63) 1222 1749

Segment cluster M79217_PEA_(—)1_node_(—)14 (SEQ ID NO:590) according tothe present invention is supported by 65 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59),M79217_PEA_(—)1_T3 (SEQ ID NO:60), M79217_PEA_(—)1_T8 (SEQ ID NO:61) andM79217_PEA_(—)1_T15 (SEQ ID NO: 63). Table 569 below describes thestarting and ending position of this segment on each transcript.

TABLE 569 Segment location on transcripts Segment starting SegmentTranscript name position ending position M79217_PEA_1_T1 (SEQ ID NO: 59)2076 3221 M79217_PEA_1_T3 (SEQ ID NO: 60) 1750 2895 M79217_PEA_1_T8 (SEQID NO: 61) 1750 2895 M79217_PEA_1_T15 (SEQ ID NO: 63) 1750 2895

Segment cluster M79217_PEA_(—)1_node_(—)16 (SEQ ID NO:591) according tothe present invention is supported by 51 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59),M79217_PEA_(—)1_T3 (SEQ ID NO:60), M79217_PEA_(—)1_T8 (SEQ ID NO:61) andM79217_PEA_(—)1_T15 (SEQ ID NO:63). Table 570 below describes thestarting and ending position of this segment on each transcript.

TABLE 570 Segment location on transcripts Segment starting SegmentTranscript name position ending position M79217_PEA_1_T1 (SEQ ID NO: 59)3222 3349 M79217_PEA_1_T3 (SEQ ID NO: 60) 2896 3023 M79217_PEA_1_T8 (SEQID NO: 61) 2896 3023 M79217_PEA_1_T15 (SEQ ID NO: 63) 2896 3023

Segment cluster M79217_PEA_(—)1_node_(—)23 (SEQ ID NO:592) according tothe present invention is supported by 50 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59),M79217_PEA_(—)1_T3 (SEQ ID NO:60), M79217_PEA_(—)1_T8 (SEQ ID NO:61),M79217_PEA_(—)1_T10 (SEQ ID NO:62) and M79217_PEA_(—)1_T15 (SEQ IDNO:63). Table 571 below describes the starting and ending position ofthis segment on each transcript.

TABLE 571 Segment location on transcripts Segment starting SegmentTranscript name position ending position M79217_PEA_1_T1 (SEQ ID NO: 59)3350 3494 M79217_PEA_1_T3 (SEQ ID NO: 60) 3024 3168 M79217_PEA_1_T8 (SEQID NO: 61) 3024 3168 M79217_PEA_1_T10 (SEQ ID NO: 62) 157 301M79217_PEA_1_T15 (SEQ ID NO: 63) 3024 3168

Segment cluster M79217_PEA_(—)1_node_(—)24 (SEQ ID NO:593) according tothe present invention is supported by 2 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T15 (SEQ IDNO:63). Table 572 below describes the starting and ending position ofthis segment on each transcript.

TABLE 572 Segment location on transcripts Segment starting SegmentTranscript name position ending position M79217_PEA_1_T15 (SEQ ID NO:63) 3169 3580

Segment cluster M79217_PEA_(—)1_node_(—)31 (SEQ ID NO:594) according tothe present invention is supported by 50 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59),M79217_PEA_(—)1_T3 (SEQ ID NO:60), M79217PEA1_T8 (SEQ ID NO:61) andM79217_PEA_(—)1_T10 (SEQ ID NO:62). Table 573 below describes thestarting and ending position of this segment on each transcript.

TABLE 573 Segment location on transcripts Segment starting SegmentTranscript name position ending position M79217_PEA_1_T1 (SEQ ID NO: 59)3716 3960 M79217_PEA_1_T3 (SEQ ID NO: 60) 3390 3634 M79217_PEA_1_T8 (SEQID NO: 61) 3354 3598 M79217_PEA_1_T10 (SEQ ID NO: 62) 523 767

Segment cluster M79217_PEA_(—)1_node_(—)33 (SEQ ID NO:595) according tothe present invention is supported by 71 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59),M79217_PEA_(—)1_T3 (SEQ ID NO:60), M79217_PEA_(—)1_T8 (SEQ ID NO:61) andM79217_PEA_(—)1_T10 (SEQ ID NO:62). Table 574 below describes thestarting and ending position of this segment on each transcript.

TABLE 574 Segment location on transcripts Segment starting SegmentTranscript name position ending position M79217_PEA_1_T1 (SEQ ID NO: 59)4015 4631 M79217_PEA_1_T3 (SEQ ID NO: 60) 3689 4305 M79217_PEA_1_T8 (SEQID NO: 61) 3653 4269 M79217_PEA_1_T10 (SEQ ID NO: 62) 822 1438

Segment cluster M79217_PEA_(—)1_node_(—)34 (SEQ ID NO:596) according tothe present invention is supported by 51 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59),M79217_PEA_(—)1_T3 (SEQ ID NO:60), M79217_PEA_(—)1_T8 (SEQ ID NO:61) andM79217_PEA_(—)1_T10 (SEQ ID NO:62). Table 575 below describes thestarting and ending position of this segment on each transcript.

TABLE 575 Segment location on transcripts Segment starting SegmentTranscript name position ending position M79217_PEA_1_T1 (SEQ ID NO: 59)4632 4869 M79217_PEA_1_T3 (SEQ ID NO: 60) 4306 4543 M79217_PEA_1_T8 (SEQID NO: 61) 4270 4507 M79217_PEA_1_T10 (SEQ ID NO: 62) 1439 1676

Segment cluster M79217_PEA_(—)1_node_(—)35 (SEQ ID NO:597) according tothe present invention is supported by 53 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59),M79217_PEA_(—)1_T3 (SEQ ID NO:60), M79217_PEA_(—)1_T8 (SEQ ID NO:61) andM79217_PEA_(—)1_T10 (SEQ ID NO:62). Table 576 below describes thestarting and ending position of this segment on each transcript.

TABLE 576 Segment location on transcripts Segment starting SegmentTranscript name position ending position M79217_PEA_1_T1 (SEQ ID NO: 59)4870 4997 M79217_PEA_1_T3 (SEQ ID NO: 60) 4544 4671 M79217_PEA_1_T8 (SEQID NO: 61) 4508 4635 M79217_PEA_1_T10 (SEQ ID NO: 62) 1677 1804

Segment cluster M79217_PEA_(—)1_node_(—)37 (SEQ ID NO:598) according tothe present invention is supported by 58 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59),M79217_PEA_(—)1_T3 (SEQ ID NO:60), M79217PEA_(—)1_T8 (SEQ ID NO:61) andM79217_PEA_(—)1_T10 (SEQ ID NO:62). Table 577 below describes thestarting and ending position of this segment on each transcript.

TABLE 577 Segment location on transcripts Segment starting SegmentTranscript name position ending position M79217_PEA_1_T1 (SEQ ID NO: 59)5039 5280 M79217_PEA_1_T3 (SEQ ID NO: 60) 4713 4954 M79217_PEA_1_T8 (SEQID NO: 61) 4677 4918 M79217_PEA_1_T10 (SEQ ID NO: 62) 1846 2087

Segment cluster M79217_PEA_(—)1_node_(—)38 (SEQ ID NO:599) according tome present invention is supported by 62 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59),M79217_PEA_(—)1_T3 (SEQ ID NO:60), M79217_PEA_(—)1_T8 (SEQ ID NO:61) andM79217_PEA_(—)1_T10 (SEQ ID NO:62). Table 578 below describes thestarting and ending position of this segment on each transcript.

TABLE 578 Segment location on transcripts Segment starting SegmentTranscript name position ending position M79217_PEA_1_T1 (SEQ ID NO: 59)5281 5436 M79217_PEA_1_T3 (SEQ ID NO: 60) 4955 5110 M79217_PEA_1_T8 (SEQID NO: 61) 4919 5074 M79217_PEA_1_T10 (SEQ ID NO: 62) 2088 2243

Segment cluster M79217_PEA_(—)1_node_(—)41 (SEQ ID NO:600) according tothe present invention is supported by 171 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59),M79217_PEA_(—)1_T3 (SEQ ID NO:60), M79217_PEA_(—)1T8 (SEQ ID NO:61),M79217_PEA_(—)1_T10 (SEQ ID NO:62) and M79217_PEA_(—)1_T18 (SEQ IDNO:64). Table 579 below describes the starting and ending position ofthis segment on each transcript.

TABLE 579 Segment location on transcripts Segment Segment startingending Transcript name position position M79217_PEA_1_T1 (SEQ ID NO: 59)5628 6357 M79217_PEA_1_T3 (SEQ ID NO: 60) 5302 6031 M79217_PEA_1_T8 (SEQID NO: 61) 5266 5995 M79217_PEA_1_T10 (SEQ ID NO: 62) 2435 3164M79217_PEA_1_T18 (SEQ ID NO: 64) 755 1484

Segment cluster M79217_PEA_(—)1_node_(—)44 (SEQ ID NO:601) according tothe present invention is supported by 89 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59),M79217_PEA_(—)1_T3 (SEQ ID NO:60), M79217_PEA_(—)1_T8 (SEQ ID NO:61),M79217_PEA_(—)1_T10 (SEQ ID NO:62) and M79217_PEA_(—)1_T18 (SEQ IDNO:64). Table 580 below describes the starting and ending position ofthis segment on each transcript.

TABLE 580 Segment location on transcripts Segment Segment startingending Transcript name position position M79217_PEA_1_T1 (SEQ ID NO: 59)6472 6659 M79217_PEA_1_T3 (SEQ ID NO: 60) 6146 6333 M79217_PEA_1_T8 (SEQID NO: 61) 6110 6297 M79217_PEA_1_T10 (SEQ ID NO: 62) 3279 3466M79217_PEA_1_T18 (SEQ ID NO: 64) 1599 1786

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 bp in length, and so are included in a separatedescription.

Segment cluster M79217_PEA_(—)1_node_(—)0 (SEQ ID NO:602) according tothe present invention is supported by 4 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T3 (SEQ ID NO:60).Table 581 below describes the starting and ending position of thissegment on each transcript.

TABLE 581 Segment location on transcripts Segment Segment startingending Transcript name position position M79217_PEA_1_T3 (SEQ ID NO: 60)1 49

Segment cluster M79217_PEA_(—)1_node_(—)7 (SEQ ID NO:603) according tothe present invention is supported by 11 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T3 (SEQ ID NO:60),M79217_PEA_(—)1_T8 (SEQ ID NO:61), M79217_PEA_(—)1_T15 (SEQ ID NO:63)and M79217_PEA_(—)1_T18 (SEQ ID NO:64). Table 582 below describes thestarting and ending position of this segment on each transcript.

TABLE 582 Segment location on transcripts Segment Segment startingending Transcript name position position M79217_PEA_1_T3 (SEQ ID NO: 60)178 271 M79217_PEA_1_T8 (SEQ ID NO: 61) 178 271 M79217_PEA_1_T15 (SEQ IDNO: 63) 178 271 M79217_PEA_1_T18 (SEQ ID NO: 64) 178 271

Segment cluster M79217_PEA_(—)1_node_(—)12 (SEQ ID NO:604) according tothe present invention can be found in the following transcript(s):M79217_PEA_(—)1_T1 (SEQ ID NO:59), M79217_PEA_(—)1_T3 (SEQ ID NO:60),M79217_PEA_(—)1_T8 (SEQ ID NO:61) and M79217_PEA_(—)1_T15 (SEQ IDNO:63). Table 583 below describes the starting and ending position ofthis segment on each transcript.

TABLE 583 Segment location on transcripts Segment Segment startingending Transcript name position position M79217_PEA_1_T1 (SEQ ID NO: 59)1524 1547 M79217_PEA_1_T3 (SEQ ID NO: 60) 1198 1221 M79217_PEA_1_T8 (SEQID NO: 61) 1198 1221 M79217_PEA_1_T15 (SEQ ID NO: 63) 1198 1221

Segment cluster M79217_PEA_(—)1_node_(—)19 (SEQ ID NO:605) according tothe present invention is supported by 1 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T10 (SEQ IDNO:62). Table 584 below describes the starting and ending position ofthis segment on each transcript.

TABLE 584 Segment location on transcripts Segment Segment startingending Transcript name position position M79217_PEA_1_T10 (SEQ ID NO:62) 1 79

Segment cluster M79217_PEA_(—)1_node_(—)21 (SEQ ID NO:606) according tothe present invention is supported by 1 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_l_T10 (SEQ ID NO:62).Table 585 below describes the starting and ending position of thissegment on each transcript.

TABLE 585 Segment location on transcripts Segment Segment startingending Transcript name position position M79217_PEA_1_T10 (SEQ ID NO:62) 80 156

Segment cluster M79217_PEA_(—)1_node_(—)26 (SEQ ID NO:607) according tothe present invention is supported by 40 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59),M79217_PEA_(—)1_T3 (SEQ ID NO:60) and M79217_PEA_(—)1_T10 (SEQ IDNO:62). Table 586 below describes the starting and ending position ofthis segment on each transcript.

TABLE 586 Segment location on transcripts Segment Segment startingending Transcript name position position M79217_PEA_1_T1 (SEQ ID NO: 59)3495 3530 M79217_PEA_1_T3 (SEQ ID NO: 60) 3169 3204 M79217_PEA_1_T10(SEQ ID NO: 62) 302 337

Segment cluster M79217_PEA_(—)1_node_(—)27 (SEQ ID NO:608) according tothe present invention is supported by 46 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59),M79217_PEA_(—)1_T3 (SEQ ID NO:60), M79217PEA_(—)1_T8 (SEQ ID NO:61) andM79217PEA_(—)1_T10 (SEQ ID NO:62). Table 587 below describes thestarting and ending position of this segment on each transcript.

TABLE 587 Segment location on transcripts Segment Segment startingending Transcript name position position M79217_PEA_1_T1 (SEQ ID NO: 59)3531 3623 M79217_PEA_1_T3 (SEQ ID NO: 60) 3205 3297 M79217_PEA_1_T8 (SEQID NO: 61) 3169 3261 M79217_PEA_1_T10 (SEQ ID NO: 62) 338 430

Segment cluster M79217_PEA_(—)1_node_(—)30 (SEQ ID NO:609) according tothe present invention is supported by 47 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59),M79217_PEA_(—)1_T3 (SEQ ID NO:60), M79217_PEA_(—)1_T8 (SEQ ID NO:61) andM79217_PEA_(—)1_T10 (SEQ ID NO:62). Table 588 below describes thestarting and ending position of this segment on each transcript.

TABLE 588 Segment location on transcripts Segment Segment startingending Transcript name position position M79217_PEA_1_T1 (SEQ ID NO: 59)3624 3715 M79217_PEA_1_T3 (SEQ ID NO: 60) 3298 3389 M79217_PEA_1_T8 (SEQID NO: 61) 3262 3353 M79217_PEA_1_T10 (SEQ ID NO: 62) 431 522

Segment cluster M79217_PEA_(—)1_node_(—)32 (SEQ ID NO:610) according tothe present invention is supported by 40 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59),M79217_PEA_(—)1_T3 (SEQ ID NO:60), M79217_PEA_(—)1_T8 (SEQ ID NO:61) andM79217_PEA_(—)1_T10 (SEQ ID NO:62). Table 589 below describes thestarting and ending position of this segment on each transcript.

TABLE 589 Segment location on transcripts Segment Segment startingending Transcript name position position M79217_PEA_1_T1 (SEQ ID NO: 59)3961 4014 M79217_PEA_1_T3 (SEQ ID NO: 60) 3635 3688 M79217_PEA_1_T8 (SEQID NO: 61) 3599 3652 M79217_PEA_1_T10 (SEQ ID NO: 62) 768 821

Segment cluster M79217_PEA_(—)1_node_(—)36 (SEQ ID NO:611) according tothe present invention is supported by 42 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59),M79217_PEA_(—)1_T3 (SEQ ID NO:60), M79217_PEA_(—)1_T8 (SEQ ID NO:61) andM79217_PEA_(—)1_T10 (SEQ ID NO:62). Table 590 below describes thestarting and ending position of this segment on each transcript.

TABLE 590 Segment location on transcripts Segment Segment startingending Transcript name position position M79217_PEA_1_T1 (SEQ ID NO: 59)4998 5038 M79217_PEA_1_T3 (SEQ ID NO: 60) 4672 4712 M79217_PEA_1_T8 (SEQID NO: 61) 4636 4676 M79217_PEA_1_T10 (SEQ ID NO: 62) 1805 1845

Segment cluster M79217_PEA_(—)1_node_(—)39 (SEQ ID NO:612) according tothe present invention is supported by 57 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59),M79217_PEA_(—)1_T3 (SEQ ID NO:60), M79217_PEA_(—)1_T8 (SEQ ID NO:61) andM79217_PEA_(—)1_T10 (SEQ ID NO:62). Table 591 below describes thestarting and ending position of this segment on each transcript.

TABLE 591 Segment location on transcripts Segment Segment startingending Transcript name position position M79217_PEA_1_T1 (SEQ ID NO: 59)5437 5520 M79217_PEA_1_T3 (SEQ ID NO: 60) 5111 5194 M79217_PEA_1_T8 (SEQID NO: 61) 5075 5158 M79217_PEA_1_T10 (SEQ ID NO: 62) 2244 2327

Segment cluster M79217_PEA_(—)1_node_(—)40 (SEQ ID NO:613) according tothe present invention is supported by 59 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59),M79217_PEA_(—)1_T3 (SEQ ID NO:60), M79217_PEA_(—)1_T8 (SEQ ID NO:61) andM79217_PEA_(—)1_T10 (SEQ ID NO:62). Table 592 below describes thestarting and ending position of this segment on each transcript.

TABLE 592 Segment location on transcripts Segment Segment startingending Transcript name position position M79217_PEA_1_T1 (SEQ ID NO: 59)5521 5627 M79217_PEA_1_T3 (SEQ ID NO: 60) 5195 5301 M79217_PEA_1_T8 (SEQID NO: 61) 5159 5265 M79217_PEA_1_T10 (SEQ ID NO: 62) 2328 2434

Segment cluster M79217_PEA_(—)1_node_(—)42 (SEQ ID NO:614) according tothe present invention is supported by 99 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59),M79217_PEA_(—)1_T3 (SEQ ID NO:60), M79217_PEA_(—)1_T8 (SEQ ID NO:61),M79217_PEA_(—)1_T10 (SEQ ID NO:62) and M79217_PEA_(—)1_T18 (SEQ IDNO:64). Table 593 below describes the starting and ending position ofthis segment on each transcript.

TABLE 593 Segment location on transcripts Segment Segment startingending Transcript name position position M79217_PEA_1_T1 (SEQ ID NO. 59)6358 6443 M79217_PEA_1_T3 (SEQ ID NO: 60) 6032 6117 M79217_PEA_1_T8 (SEQID NO: 61) 5996 6081 M79217_PEA_1_T10 (SEQ ID NO: 62) 3165 3250M79217_PEA_1_T18 (SEQ ID NO: 64) 1485 1570

Segment cluster M79217_PEA_(—)1_node_(—)43 (SEQ ID NO:615) according tothe present invention is supported by 90 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M79217_PEA_(—)1_T1 (SEQ ID NO:59),M79217_PEA_(—)1_T3 (SEQ ID NO:60), M79217_PEA_(—)1_T8 (SEQ ID NO:61),M79217_PEA_(—)1_T10 (SEQ ID NO:62) and M79217_PEA_(—)1_T18 (SEQ IDNO:64). Table 594 below describes the starting and ending position ofthis segment on each transcript.

TABLE 594 Segment location on transcripts Segment Segment startingending Transcript name position position M79217_PEA_1_T1 (SEQ ID NO: 59)6444 6471 M79217_PEA_1_T3 (SEQ ID NO: 60) 6118 6145 M79217_PEA_1_T8 (SEQID NO: 61) 6082 6109 M79217_PEA_1_T10 (SEQ ID NO: 62) 3251 3278M79217_PEA_1_T18 (SEQ ID NO: 64) 1571 1598Variant protein alignment to the previously known protein:

Sequence name: BAA25445 (SEQ ID NO: 1437) Sequence documentation:Alignment of: M79217_PEA_1_P1 (SEQ ID NO: 1336) × BAA25445 (SEQ ID NO:1437) Alignment segment 1/1: Quality: 9101.00 Escore: 0 Matching length:919 Total length: 919 Matching Percent Similarity: 100.00 MatchingPercent Identity: 100.00 Total Percent Similarity: 100.00 Total PercentIdentity: 100.00 Gaps: 0 Alignment:

Sequence name: EXL3_HUMAN (SEQ ID NO: 1436) Sequence documentation:Alignment of: M79217_PEA_1_P2 (SEQ ID NO: 1337) × EXL3_HUMAN (SEQ ID NO:1436) Alignment segment 1/1: Quality: 8873.00 Escore: 0 Matching length:907 Total length: 919 Matching Percent Similarity: 100.00 MatchingPercent Identity: 100.00 Total Percent Similarity: 98.69 Total PercentIdentity: 98.69 Gaps: 1 Alignment:

Sequence name: EXL3_HUMAN (SEQ ID NO: 1436) Sequence documentation:Alignment of: M79217_PEA_1_P4 (SEQ ID NO: 1338) × EXL3_HUMAN (SEQ ID NO:1436) Alignment segment 1/1: Quality: 1668.00 Escore: 0 Matching length:162 Total length: 162 Matching Percent Similarity: 100.00 MatchingPercent Identity: 99.38 Total Percent Similarity: 100.00 Total PercentIdentity: 99.38 Gaps: 0 Alignment:

Sequence name: EXL3_HUMAN (SEQ ID NO: 1436) Sequence documentation:Alignment of: M79217_PEA_1_P8 (SEQ ID NO: 1339) × EXL3_HUMAN (SEQ ID NO:1436) Alignment segment 1/1: Quality: 7947.00 Escore: 0 Matching length:807 Total length: 807 Matching Percent Similarity: 100.00 MatchingPercent Identity: 100.00 Total Percent Similarity: 100.00 Total PercentIdentity: 100.00 Gaps: 0 Alignment:

Description for Cluster M62096

Cluster M62096 features 9 transcript(s) and 42 segment(s) of interest,the names for which are given in Tables 595 and 596, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 597.

TABLE 595 Transcripts of interest Transcript name Sequence ID No.M62096_PEA_1_T4 65 M62096_PEA_1_T5 66 M62096_PEA_1_T6 67 M62096_PEA_1_T768 M62096_PEA_1_T9 69 M62096_PEA_1_T11 70 M62096_PEA_1_T13 71M62096_PEA_1_T14 72 M62096_PEA_1_T15 73

TABLE 596 Segments of interest Segment Name Sequence ID No.M62096_PEA_1_node_0 616 M62096_PEA_1_node_2 617 M62096_PEA_1_node_15 618M62096_PEA_1_node_17 619 M62096_PEA_1_node_19 620 M62096_PEA_1_node_23621 M62096_PEA_1_node_27 623 M62096_PEA_1_node_29 624M62096_PEA_1_node_31 625 M62096_PEA_1_node_34 626 M62096_PEA_1_node_36627 M62096_PEA_1_node_38 628 M62096_PEA_1_node_40 629M62096_PEA_1_node_48 630 M62096_PEA_1_node_50 631 M62096_PEA_1_node_56632 M62096_PEA_1_node_60 633 M62096_PEA_1_node_65 634M62096_PEA_1_node_69 635 M62096_PEA_1_node_71 636 M62096_PEA_1_node_1637 M62096_PEA_1_node_4 638 M62096_PEA_1_node_6 639 M62096_PEA_1_node_7640 M62096_PEA_1_node_9 641 M62096_PEA_1_node_11 642M62096_PEA_1_node_13 643 M62096_PEA_1_node_21 644 M62096_PEA_1_node_25645 M62096_PEA_1_node_33 646 M62096_PEA_1_node_42 647M62096_PEA_1_node_44 648 M62096_PEA_1_node_47 649 M62096_PEA_1_node_51650 M62096_PEA_1_node_53 651 M62096_PEA_1_node_55 652M62096_PEA_1_node_58 653 M62096_PEA_1_node_62 654 M62096_PEA_1_node_66655 M62096_PEA_1_node_67 656 M62096_PEA_1_node_68 657M62096_PEA_1_node_70 658

TABLE 597 Proteins of interest Protein Name Sequence ID No.Corresponding Transcript(s) M62096_PEA_1_P4 1341 M62096_PEA_1_T6 (SEQ IDNO: 67) M62096_PEA_1_P5 1342 M62096_PEA_1_T7 (SEQ ID NO: 68)M62096_PEA_1_P3 1343 M62096_PEA_1_T9 (SEQ ID NO: 69) M62096_PEA_1_P71344 M62096_PEA_1_T11 (SEQ ID NO: 70) M62096_PEA_1_P8 1345M62096_PEA_1_T13 (SEQ ID NO: 71) M62096_PEA_1_P9 1346 M62096_PEA_1_T14(SEQ ID NO: 72) M62096_PEA_1_P10 1347 M62096_PEA_1_T15 (SEQ ID NO: 73)M62096_PEA_1_P11 1348 M62096_PEA_1_T4 (SEQ ID NO: 65) M62096_PEA_1_P121349 M62096_PEA_1_T5 (SEQ ID NO: 66)

These sequences are variants of the known protein Kinesin heavy chainisoform 5C (SwissProt accession identifier KF5C_HUMAN; known alsoaccording to the synonyms Kinesin heavy chain neuron-specific 2), SEQ IDNO: 1438, referred to herein as the previously known protein.

Protein Kinesin heavy chain isoform 5C (SEQ ID NO:1438) is known orbelieved to have the following function(s): Kinesin is amicrotubule-associated force-producing protein that may play a role inorganelle transport. The sequence for protein Kinesin heavy chainisoform 5C is given at the end of the application, as “Kinesin heavychain isoform 5C amino acid sequence”. Known polymorphisms for thissequence are as shown in Table 598.

TABLE 598 Amino acid mutations for Known Protein SNP position(s) onamino acid sequence Comment 355-360 TLKNVI −> STHASV 583-585 EFT −> DRV

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: organelle organization andbiogenesis, which are annotation(s) related to Biological Process;microtubule motor; ATP binding, which are annotation(s) related toMolecular Function; and kinesin, which are annotation(s) related toCellular Component.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

As noted above, cluster M62096 features 9 transcript(s), which werelisted in Table 595 above. These transcript(s) encode for protein(s)which are variant(s) of protein Kinesin heavy chain isoform 5C (SEQ IDNO:1438). A description of each variant protein according to the presentinvention is now provided.

Variant protein M62096_PEA_(—)1_P4 (SEQ ID NO:1341) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) M62096_PEA_(—)1_T6 (SEQ IDNO:67). An alignment is given to the known protein (Kinesin heavy chainisoform 5C (SEQ ID NO:1438)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between M62096_PEA_(—)1_P4 (SEQ ID NO:1341) andKF5C_HUMAN (SEQ ID NO:1438):

1.An isolated chimeric polypeptide encoding for M62096_PEA_(—)1_P4 (SEQID NO:1341), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence MATYIH (SEQ ID NO: 1726) corresponding to aminoacids 1-6 of M62096_PEA_(—)1_P4 (SEQ ID NO:1341), and a second aminoacid sequence being at least 90% homologous toVSKTGAEGAVLDEAKNINKSLSALGNVISALAEGTKTHVPYRDSKMTRILQDSLGGNCRTTIVICCSPSVFNEAETKSTLMFGQRAKTIKNTVSVNLELTAEEWKKKYEKEKEKNKTLKNVIQHLEMELNRWRNGEAVPEDEQISAKDQKNLEPCDNTPIIDNIAPVVAGISTEEKEKYDEEISSLYRQLDDKDDEINQQSQLAEKLKQQMLDQDELLASTRRDYEKIQEELTRLQIENEAAKDEVKEVLQALEELAVNYDQKSQEVEDKTRANEQLTDELAQKTTTLTTTQRELSQLQELSNHQKKRATEILNLLLKDLGEIGGIIGTNDVKTLADVNGVIEEEFTMARLYISKMKSEVKSLVNRSKQLESAQMDSNRKMNASERELAACQLLISQHEAKIKSLTDYMQNMEQKRRQLEESQDSLSEELAKLRAQEKMHEVSFQDKEKEHLTRLQDAEEMKKALEQQMESHREAHQKQLSRLRDEIEEKQKIIDEIRDLNQKLQLEQEKLSSDYNKLKIEDQEREMKLEKLLLLNDKREQAREDLKGLEETVSRELQTLHNLRKLFVQDLTTRVKKSVELDNDDGGGSAAQKQKISFLENNLEQLTKVHKQLVRDNADLRCELPKLEKRLRATAERVKALESALKEAKENAMRDRKRYQQEVDRIKEAVRAKNMARRAHSAQIAKPIRPGHYPASSPTAVHAIRGGGGSSSNSTHYQK corresponding to amino acids 239-957 of KF5C_HUMAN(SEQ ID NO:1438), which also corresponds to amino acids 7-725 ofM62096_PEA_(—)1_P4 (SEQ ID NO:1341), wherein said first amino acidsequence and second amino acid sequence are contiguous and in asequential order.

2.An isolated polypeptide encoding for a head of M62096_PEA_(—)1_P4 (SEQID NO:1341), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence MATYIH (SEQ ID NO: 1726) of M62096_PEA_(—)1_P4 (SEQ IDNO:1341).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:intracellularly. The protein localization is believed to beintracellularly because neither of the trans-membrane region predictionprograms predicted a trans-membrane region for this protein. In additionboth signal-peptide prediction programs predict that this protein is anon-secreted protein.

Variant protein M62096_PEA_(—)1_P4 (SEQ ID NO:1341) is encoded by thefollowing transcript(s): M62096_PEA_(—)1_T6 (SEQ ID NO:67), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript M62096_PEA_(—)1_T6 (SEQ ID NO:67) is shown inbold; this coding portion starts at position 108 and ends at position2282. The transcript also has the following SNPs as listed in Table 599(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinM62096_PEA_(—)1_P4 (SEQ ID NO:1341) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 599 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 5757 G −> T No

Variant protein M62096_PEA_(—)1_P5 (SEQ ID NO:1342) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) M62096_PEA_(—)1_T7 (SEQ IDNO:68). An alignment is given to the known protein (Kinesin heavy chainisoform 5C (SEQ ID NO:1438)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between M62096_PEA_(—)1_P5 (SEQ ID NO:1342) andKF5C_HUMAN (SEQ ID NO:1438):

1.An isolated chimeric polypeptide encoding for M62096_PEA_(—)1_P5 (SEQID NO:1342), comprising a first amino acid sequence being at least 90%homologous toMTRILQDSLGGNCRTTIVICCSPSVFNEAETKSTLMFGQRAKTIKNTVSVNLELTAEEWKKKYEKEKEKNKTLKNVIQHLEMELNRWRNGEAVPEDEQISAKDQKNLEPCDNTPIIDNIAPVVAGISTEEKEKYDEEISSLYRQLDDKDDEINQQSQLAEKLKQQMLDQDELLASTRRDYEKIQEELTRLQIENEAAKDEVKEVLQALEELAVNYDQKSQEVEDKTRANEQLTDELAQKTTTLTTTQRELSQLQELSNHQKKRATEILNLLLKDLGEIGGIIGTNDVKTLADVNGVIEEEFTMARLYISKMKSEVKSLVNRSKQLESAQMDSNRKMNASERELAACQLLISQHEAKIKSLTDYMQNMEQKRRQLEESQDSLSEELAKLRAQEKMHEVSFQDKEKEHLTRLQDAEEMKKALEQQMESHREAHQKQLSRLRDEIEEKQKIIDEIRDLNQKLQLEQEKLSSDYNKLKIEDQEREMKLEKLLLLNDKREQAREDLKGLEETVSRELQTLHNLRKLFVQDLTTRVKKSVELDNDDGGGSAAQKQKISFLENNLEQLTKVHKQLVRDNADLRCELPKLEKRLRATAERVKALESALKEAKENAMRDRKRYQQEVDRIKEAVRAKNMARRAHSAQIAKPIRPGHYPASSPTAVHAIRGGGGSSSNSTHYQK corresponding to amino acids284-957 of KF5C_HUMAN (SEQ ID NO:1438), which also corresponds to aminoacids 1-674 of M62096_PEA_(—)1_P5 (SEQ ID NO:1342).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:intracellularly. The protein localization is believed to beintracellularly because neither of the trans-membrane region predictionprograms predicted a trans-membrane region for this protein. In additionboth signal-peptide prediction programs predict that this protein is anon-secreted protein.

Variant protein M62096_PEA_(—)1_P5 (SEQ ID NO:1342) is encoded by thefollowing transcript(s): M62096_PEA_(—)1_T7 (SEQ ID NO:68), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript M62096_PEA_(—)1_T7 (SEQ ID NO:68) is shown inbold; this coding portion starts at position 283 and ends at position2304. The transcript also has the following SNPs as listed in Table 600(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinM62096_PEA_(—)1_P5 (SEQ ID NO:1342) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 600 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 5779 G −> T No

Variant protein M62096_PEA_(—)1_P3 (SEQ ID NO:1343) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) M62096_PEA_(—)1_T9 (SEQ IDNO:69). An alignment is given to the known protein (Kinesin heavy chainisoform 5C (SEQ ID NO:1438)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between M62096_PEA_(—)1_P3 (SEQ ID NO:1343) andKF5C_HUMAN (SEQ ID NO:1438):

1.An isolated chimeric polypeptide encoding for M62096_PEA_(—)1_P3 (SEQID NO:1343), comprising a first amino acid sequence being at least 90%homologous toMELNRWRNGEAVPEDEQISAKDQKNLEPCDNTPIIDNIAPVVAGISTEEKEKYDEEISSLYRQLDDKDDEINQQSQLAEKLKQQMLDQDELLASTRRDYEKIQEELTRLQIENEAAKDEVKEVLQALEELAVNYDQKSQEVEDKTRANEQLTDELAQKTTTLTTTQRELSQLQELSNHQKKRATEILNLLLKDLGEIGGIIGTNDVKTLADVNGVIEEEFTMARLYISKMKSEVKSLVNRSKQLESAQMDSNRICMNASERELAACQLLISQHEAKIKSLTDYMQNMEQKRRQLEESQDSLSEELAKLRAQEKMHEVSFQDKEKEHLTRLQDAEEMKKALEQQMESHREAHQKQLSRLRDEIEEKQKIIDEIRDLNQKLQLEQEKLSSDYNKLKIEDQEREMKLEKLLLLNDKREQAREDLKGLEETVSRELQTLHNLRKLFVQDLTTRVKKSVELDNDDGGGSAAQKQKISFLENNLEQLTKVHKQLVRDNADLRCELPKLEKRLRATAERVKALESALKEAKENAMRDRKRYQQEVDRIKEAVRAKNMARRAHSAQIAKPIRPGHYPASSPTAVHAIRGGGGSSSNSTHYQK corresponding to amino acids 365-957of KF5C_HUMAN (SEQ ID NO:1438), which also corresponds to amino acids1-593 of M62096_PEA_(—)1_P3 (SEQ ID NO:1343).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:intracellularly. The protein localization is believed to beintracellularly because neither of the trans-membrane region predictionprograms predicted a trans-membrane region for this protein. In additionboth signal-peptide prediction programs predict that this protein is anon-secreted protein.

Variant protein M62096_PEA_(—)1_P3 (SEQ ID NO:1343) is encoded by thefollowing transcript(s): M62096_PEA_(—)1_T9 (SEQ ID NO:69), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript M62096_PEA_(—)1_T9 (SEQ ID NO:69) is shown inbold; this coding portion start at position 565 and ends at position2343. The transcript also has the following SNPs as listed in Table 601(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinM62096_PEA_(—)1_P3 (SEQ ID NO:1343) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 601 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 5818 G −> T No

Variant protein M62096_PEA_(—)1_P7 (SEQ ID NO:1344) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) M62096_PEA_(—)1_T11 (SEQ IDNO:70). An alignment is given to the known protein (Kinesin heavy chainisoform 5C (SEQ ID NO:1438)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between M62096_PEA_(—)1_P7 (SEQ ID NO:1344) andKF5C_HUMAN (SEQ ID NO:1438):

1.An isolated chimeric polypeptide encoding for M62096_PEA_(—)1_P7 (SEQID NO:1344), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence MTQNFRLMWNILLFPLNFS (SEQ ID NO: 1727) correspondingto amino acids 1-19 of M62096_PEA_(—)1_P7 (SEQ ID NO:1344), and a secondamino acid sequence being at least 90% homologous toLNQKLQLEQEKLSSDYNKLKIEDQEREMKLEKLLLLNDKREQAREDLKGLEETVSRELQTLHNLRKLFVQDLTTRVKKSVELDNDDGGGSAAQKQKISFLENNLEQLTKVHKQLVRDNADLRCELPKLEKRLRATAERVKALESALKEAKENAMRDRKRYQQEVDRIKEAVRAKNMARRAHSAQIAKPIRPGHYPASSPTAVHAIRGGGGSSSNSTHYQK corresponding to amino acids 738-957 of KF5C_HUMAN (SEQ IDNO:1438), which also corresponds to amino acids 20-239 ofM62096_PEA_(—)1_P7 (SEQ ID NO:1344), wherein said first amino acidsequence and second amino acid sequence are contiguous and in asequential order.

2.An isolated polypeptide encoding for a head of M62096_PEA_(—)1_P7 (SEQID NO:1344), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence MTQNFRLMWNILLFPLNFS (SEQ ID NO: 1727) of M62096_PEA_(—)1_P7(SEQ ID NO:1344).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseone of the two signal-peptide prediction programs (HMM:Non-secretoryprotein,NN:YES) predicts that this protein has a signal peptide.

Variant protein M62096_PEA_(—)1_P7 (SEQ ID NO:1344) is encoded by thefollowing transcript(s): M62096_PEA_(—)1_T11 (SEQ ID NO:70), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript M62096_PEA_(—)1_T11 (SEQ ID NO:70) is shown inbold; this coding portion starts at position 633 and ends at position1349. The transcript also has the following SNPs as listed in Table 602(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinM62096_PEA_(—)1_P7 (SEQ ID NO:1344) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 602 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 4824 G −> T No

Variant protein M62096_PEA_(—)1_P8 (SEQ ID NO:1345) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) M62096_PEA_(—)1_T13 (SEQ IDNO:71). An alignment is given to the known protein (Kinesin heavy chainisoform 5C (SEQ ID NO:1438)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between M62096_PEA_(—)1_P8 (SEQ ID NO:1345) andKF5C_HUMAN (SEQ ID NO:1438):

1.An isolated chimeric polypeptide encoding for M62096_PEA_(—)1_P8 (SEQID NO:1345), comprising a first amino acid sequence being at least 90%homologous toMADPAECSIKVMCRFRPLNEAEILRGDKFIPKFKGDETVVIGQGKPYVFDRVLPPNTTQEQVYNACAKQIVKDVLEGYNGTIFAYGQTSSGKTHTMEGKLHDPQLMGIIPRIAHDINFDHIYSMDENLEFHIKVSYFEIYLDKIRDLLDVSKTNLAVHEDKNRVPYVKGCTERFVSSPEEVMDVIDEGKANRHVAVTNMNEHSSRSHSIFLINIKQENVETEKKLSGKLYLVDLAGSEKVSKTGAEGAVLDEAKNINKSLSALGNVISALAEGTKTHVPYRDSKMTRILQDSLGGNCRTTIVICCSPSVFNEAETKSTLMFGQRAKTIKNTVSVNLELTAEEWKKKYEKEKEKNKTLKNVIQHLEMELNRWRNGEAVPEDEQISAKDQKNLEPCDNTPIIDNIAPVVAGISTEEKEKYDEEISSLYRQLDDKDDEINQQSQLAEKLKQQMLDQDELLASTRRDYEKIQEELTRLQIENEAAKDEVKEVLQALEELAVNYDQKSQEVEDKTRANEQLTDELAQKTTTLTTTQRELSQLQELSNHQKKRATEILNLLLKDLGEIGGIIGTNDVKTLADVNGVIEEEFTMARLYISKMKSEVKSLVNRSKQLESAQMDSNRKMNASERELAACQLLISQHEAKIKSLTDYMQNMEQKRRQLEESQDSLSEELAKLRAQEKMHEVSFQDKEKEHLTRLQDAEEMKKALEQQMESHREAHQKQLSRLRDEIEEKQKIIDEIR corresponding to amino acids 1-736 ofKF5C_HUMAN (SEQ ID NO:1438), which also corresponds to amino acids 1-736of M62096_PEA_(—)1_P8 (SEQ ID NO:1345), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence E corresponding to amino acids737-737 of M62096_PEA_(—)1_P8 (SEQ ID NO:1345), wherein said first aminoacid sequence and second amino acid sequence are contiguous and in asequential order.

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:intracellularly. The protein localization is believed to beintracellularly because neither of the trans-membrane region predictionprograms predicted a trans-membrane region for this protein. In additionboth signal-peptide prediction programs predict that this protein is anon-secreted protein.

Variant protein M62096_PEA_(—)1_P8 (SEQ ID NO:1345) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 603, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein M62096_PEA_(—)1_P8 (SEQ ID NO:1345) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 603 Amino acid mutations SNP position(s) on Alternative Previouslyamino acid sequence amino acid(s) known SNP? 5 A −> T Yes

Variant protein M62096_PEA_(—)1_P8 (SEQ ID NO:1345) is encoded by thefollowing transcript(s): M62096_PEA_(—)1_T13 (SEQ ID NO:71), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript M62096_PEA_(—)1_T13 (SEQ ID NO:71) is shown inbold; this coding portion starts at position 396 and ends at position2606. The transcript also has the following SNPs as listed in Table 604(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinM62096_PEA_(—)1_P8 (SEQ ID NO:1345) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 604 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 92 C −> A Yes 408 G −> A Yes

Variant protein M62096_PEA_(—)1_P9 (SEQ ID NO:1346) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) M62096_PEA_(—)1_T14 (SEQ IDNO:72). An alignment is given to the known protein (Kinesin heavy chainisoform 5C (SEQ ID NO:1438)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between M62096_PEA_(—)1_P9 (SEQ ID NO:1346) andKF5C_HUMAN (SEQ ID NO:1438):

1.An isolated chimeric polypeptide encoding for M62096_PEA_(—)1_P9 (SEQID NO:1346), comprising a first amino acid sequence being at least 90%homologous toMADPAECSIKVMCRFRPLNEAEILRGDKFIPKFKGDETVVIGQGKPYVFDRVLPPNTTQEQVYNACAKQIVKDVLEGYNGTIFAYGQTSSGKTHTMEGKLHDPQLMGIIPRIAHDIFDHIYSMDENLEFHIKVSYFEIYLDKIRDLLDVSKTNLAVHEDKNRVPYVKGCTERFVSSPEEVMDVIDEGKANRHVAVTNMNEHSSRSHSIFLINIKQENVETEKKLSGKLYLVDLAGSEKVSKTGAEGAVLDEAKNINKSLSALGNVISALAEGTKTHVPYRDSKMTRILQDSLGGNCRTTIVICCSPSVFNEAETKSTLMFGQRAKTIKNTVSVNLELTAEEWKKKYEKEKEKNKTLKNVIQHLEMELNRWRNGEAVPEDEQISAKDQKNLEPCDNTPIIDNIAPVVAGISTEEKEKYDEEISSLYRQLDDKDDEINQQSQLAEKLKQQMLDQDE corresponding to amino acids 1-454 ofKF5C_HUMAN (SEQ ID NO:1438), which also corresponds to amino acids 1-454of M62096_PEA_(—)1_P9 (SEQ ID NO:1346), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequenceVKNAIYFFFHKVLLLLFVVDVCSRNLIGIEAFHNYRIMWKFLGRCPFTASYKIITEFRK (SEQ ID NO:1728) corresponding to amino acids 455-514 of M62096_PEA_(—)1_P9 (SEQ IDNO:1346), wherein said first amino acid sequence and second amino acidsequence are contiguous and in a sequential order.

2.An isolated polypeptide encoding for a tail of M62096_PEA_(—)1_P9 (SEQID NO:1346), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence VKNAIYFFFHKVLLLLFVVDVCSRNLIGIEAFHNYRIMWKFLGRCPFTASYKLIITEFRK(SEQ ID NO: 1728) in M62096_PEA_(—)1_P9 (SEQ ID NO:1346).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:intracellularly. The protein localization is believed to beintracellularly because neither of the trans-membrane region predictionprograms predicted a trans-membrane region for this protein. In additionboth signal-peptide prediction programs predict that this protein is anon-secreted protein.

Variant protein M62096_PEA_(—)1_P9 (SEQ ID NO:1346) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 605, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein M62096_PEA_(—)1_P9 (SEQ ID NO:1346) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 605 Amino acid mutations SNP position(s) on Alternative Previouslyamino acid sequence amino acid(s) known SNP? 5 A −> T Yes

Variant protein M62096_PEA_(—)1_P9 (SEQ ID NO:1346) is encoded by thefollowing transcript(s): M62096_PEA_(—)1_T14 (SEQ ID NO:72), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript M62096_PEA_(—)1_T14 (SEQ ID NO:72) is shown inbold; this coding portion starts at position 396 and ends at position1937. The transcript also has the following SNPs as listed in Table 606(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinM62096_PEA_(—)1_P9 (SEQ ID NO:1346) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 606 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 92 C −> A Yes 408 G −> A Yes

Variant protein M62096_PEA_(—)1_P10 (SEQ ID NO:1347) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) M62096_PEA_(—)1_T15 (SEQ IDNO:73). An alignment is given to the known protein (Kinesin heavy chainisoform 5C (SEQ ID NO:1438)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between M62096_PEA_(—)1_P10 (SEQ ID NO:1347) andKF5C_HUMAN (SEQ ID NO:1438):

1.An isolated chimeric polypeptide encoding for M62096_PEA_(—)1_P10 (SEQID NO:1347), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence MTQNFRLMWNILLFPLNFS (SEQ ID NO: 1727) correspondingto amino acids 1-19 of M62096_PEA_(—)1_P10 (SEQ ID NO:1347), a secondamino acid sequence being at least 90% homologous toLNQKLQLEQEKLSSDYNKLKIEDQEREMKLEKLLLLNDKREQAREDLKGLEETVSRELQTLHNLRKLFVQDLTTRVKK corresponding to amino acids 738-815 of KF5C_HUMAN (SEQ IDNO:1438), which also corresponds to amino acids 20-97 ofM62096_PEA_(—)1_P10 (SEQ ID NO:1347), and a third amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence VSSLCLNGTEKKIKDGREESFSVEISLA (SEQID NO: 1730) corresponding to amino acids 98-125 of M62096_PEA_(—)1_P10(SEQ ID NO:1347), wherein said first amino acid sequence, second aminoacid sequence and third amino acid sequence are contiguous and in asequential order.

2.An isolated polypeptide encoding for a head of M62096_PEA_(—)1_P10(SEQ ID NO:1347), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence MTQNFRLMWNILLFPLNFS (SEQ ID NO: 1727) ofM62096_PEA_(—)1_P10 (SEQ ID NO:1347).

3.An isolated polypeptide encoding for a tail of M62096_PEA_(—)1_P10(SEQ ID NO:1347), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence VSSLCLNGTEKKIKDGREESFSVEISLA (SEQ ID NO:1730) in M62096_PEA_(—)1_P10 (SEQ ID NO:1347).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseone of the two signal-peptide prediction programs (HMM:Non-secretoryprotein,NN:YES) predicts that this protein has a signal peptide.

Variant protein M62096_PEA_(—)1_P10 (SEQ ID NO:1347) is encoded by thefollowing transcript(s): M62096_PEA_(—)1_T15 (SEQ ID NO:73), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript M62096_PEA_(—)1_T15 (SEQ ID NO:73) is shown inbold; this coding portion starts at position 633 and ends at position1007.

Variant protein M62096_PEA_(—)1P11 (SEQ ID NO:1348) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) M62096_PEA_(—)1_T4 (SEQ IDNO:65). An alignment is given to the known protein (Kinesin heavy chainisoform 5C (SEQ ID NO:1438)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between M62096_PEA_(—)1_P11 (SEQ ID NO:1348) andKF5C_HUMAN (SEQ ID NO:1438):

1.An isolated chimeric polypeptide encoding for M62096_PEA_(—)1_P11 (SEQID NO:1348), comprising a first amino acid sequence being at least 90%homologous toMADPAECSIKVMCRFRPLNEAEILRGDKFIPKFKGDETVVIGQGKPYVFDRVLPPNTTQEQVYNACAKQIVKDVLEGYNGTIFAYGQTSSGKTHTMEGKLHDPQLMGIIPRIAHDIFDHIYSMDENLEFHIKVSYFEIYLDKIRDLLDVSKTNLAVHEDKNRVPYVKGCTERFVSSPEEVMDVIDEGKANRHVAVTNMNEHSSRSHSIFLINIKQENVETEKKLSGKLYLVDLAGSEKVSKTGAEGAVLDEAKNINKSLSALGNVISALAEGTKTHVPYRDSKMTRILQDSLGGNCRTTIVICCSPSVFNEAETKSTLMFGQRAKTIKNTVSVNLELTAEEWKKKYEKEKEKNKTLKNVIQHLEMELNRWRN corresponding to amino acids 1-372 of KFSC_HUMAN (SEQID NO:1438), which also corresponds to amino acids 1-372 ofM62096_PEA_(—)1_P11 (SEQ ID NO:1348), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence DFLAAHVFGKLLE (SEQ ID NO: 1731)corresponding to amino acids 373-385 of M62096_PEA_(—)1_P11 (SEQ IDNO:1348), which amino acid sequence and second amino acid sequence arecontiguous and in a sequential order.

2.An isolated polypeptide encoding for a tail of M62096_PEA_(—)1_P11(SEQ ID NO:1348), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence DFLAAHVFGKLLE (SEQ ID NO: 1731) inM62096_PEA_(—)1_P11 (SEQ ID NO:1348).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:intracellularly. The protein localization is believed to beintracellularly because neither of the trans-membrane region predictionprograms predicted a trans-membrane region for this protein. In additionboth signal-peptide prediction programs predict that this protein is anon-secreted protein.

Variant protein M62096_PEA_(—)1_P11 (SEQ ID NO:1348) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 607, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein M62096_PEA_(—)1_P11 (SEQ ID NO:1348) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 607 Amino acid mutations SNP position(s) on Alternative Previouslyamino acid sequence amino acid(s) known SNP? 5 A −> T Yes

Variant protein M62096_PEA_(—)1_P11 (SEQ ID NO:1348) is encoded by thefollowing transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript M62096_PEA_(—)1_T4 (SEQ ID NO:65) is shown inbold; this coding portion starts at position 396 and ends at position1550. The transcript also has the following SNPs as listed in Table 608(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinM62096_PEA_(—)1_P11 (SEQ ID NO:1348) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 608 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 92 C −> A Yes 408 G −> A Yes6908 G −> T No

Variant protein M62096_PEA_(—)1_P12 (SEQ ID NO:1349) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) M62096_PEA_(—)1_T5 (SEQ IDNO:66). An alignment is given to the known protein (Kinesin heavy chainisoform 5C (SEQ ID NO:1438)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between M62096_PEA_(—)1_P12 (SEQ ID NO:1349) andKF5C_HUMAN (SEQ ID NO:1438):

1.An isolated chimeric polypeptide encoding for M62096_PEA_(—)1_P12 (SEQID NO:1349), comprising a first amino acid sequence being at least 90%homologous toMADPAECSIKVMCRFRPLNEAEILRGDKFIPKFKGDETVVIGQGKPYVFDRVLPPNTTQEQVYNACAKQIVKDVLEGYNGTIFAYGQTSSGKTHTMEGKLHDPQLMGIIPRIAHDIFDHIYSMDENLEFHIKVSYFEIYLDKIRDLLDVSKTNLAVHEDKNRVPYVKGCTERFVSSPEEVMDVIDEGKANRHVAVTNMNEHSSRSHSIFLINIKQENVETEKKLSGKLYLVDLAGSEKVSKTGAEGAVLDEAKNINKSLSALGNVISALAEGTKTHVPYRDSKMTRILQDSLGGNCRTTIVICCSPSVFNEAETKSTLMFGQR corresponding to amino acids1-323 of KF5C_HUMAN (SEQ ID NO:1438), which also corresponds to aminoacids 1-323 of M62096_PEA_(—)1_P12 (SEQ ID NO:1349), and a second aminoacid sequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence V corresponding to aminoacids 324-324 of M62096_PEA_(—)1_P12 (SEQ ID NO:1349), wherein saidfirst amino acid sequence and second amino acid sequence are contiguousand in a sequential order.

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:intracellularly. The protein localization is believed to beintracellularly because neither of the trans-membrane region predictionprograms predicted a trans-membrane region for this protein. In additionboth signal-peptide prediction programs predict that this protein is anon-secreted protein.

Variant protein M62096_PEA_(—)1_P12 (SEQ ID NO:1349) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 609, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein M62096_PEA_(—)1_P12 (SEQ ID NO:1349) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 609 Amino acid mutations SNP position(s) on amino acid Alternativesequence amino acid(s) Previously known SNP? 5 A -> T Yes

Variant protein M62096_PEA_(—)1_P12 (SEQ ID NO:1349) is encoded by thefollowing transcript(s): M62096_PEA_(—)1_T5 (SEQ ID NO:66), for whichthe sequence(s) is/are given at the end of the application codingportion of transcript M62096_PEA_(—)1_T5 (SEQ ID NO:66) is shown inbold; this coding portion starts at position 378 and ends at position1349. The transcript also has the following SNPs as listed in Table 610(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinM62096_PEA_(—)1_P12 (SEQ ID NO:1349) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 610 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 92 C -> A Yes 390 G -> A Yes6784 G -> T No

As noted above, cluster M62096 features 42 segment(s), which were listedin Table 596 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster M62096_PEA_(—)1_node_(—)0 (SEQ ID NO:616) according tothe present invention is supported by 14 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1_T13 (SEQ ID NO:71)and M62096_PEA_(—)1_T14 (SEQ ID NO:72). Table 611 below describes thestarting and ending position of this segment on each transcript.

TABLE 611 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 1 355 M62096_PEA_1_T5 (SEQ ID NO: 66) 1 355 M62096_PEA_1_T13 (SEQ IDNO: 71) 1 355 M62096_PEA_1_T14 (SEQ ID NO: 72) 1 355

Segment cluster M62096_PEA_(—)1_node_(—)2 (SEQ ID NO:617) according tothe present invention is supported by 12 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1_T13 (SEQ ID NO:71)and M62096_PEA_(—)1_T14 (SEQ ID NO:72). Table 612 below describes thestarting and ending position of this segment on each transcript.

TABLE 612 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 374 521 M62096_PEA_1_T5 (SEQ ID NO: 66) 356 503 M62096_PEA_1_T13(SEQ ID NO: 71) 374 521 M62096_PEA_1_T14 (SEQ ID NO: 72) 374 521

Segment cluster M62096_PEA_(—)1_node_(—)15 (SEQ ID NO:618) according tothe present invention is supported by 28 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1_T13 (SEQ ID NO:71)and M62096_PEA_(—)1_T14 (SEQ ID NO:72). Table 613 below describes thestarting and ending position of this segment on each transcript.

TABLE 613 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 985 1109 M62096_PEA_1_T5 (SEQ ID NO: 66) 967 1091 M62096_PEA_1_T13(SEQ ID NO: 71) 985 1109 M62096_PEA_1_T14 (SEQ ID NO: 72) 985 1109

Segment cluster M62096_PEA_(—)1_node_(—)17 (SEQ ID NO:619) according tothe present invention is supported by 1 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T7 (SEQ ID NO:68).Table 614 below describes the starting and ending position of thissegment on each transcript.

TABLE 614 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T7 (SEQ ID NO:68) 1 147

Segment cluster M62096_PEA_(—)1_node_(—)19 (SEQ ID NO:620) according tothe present invention is supported by 3 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T6 (SEQ ID NO:67)and M62096_PEA_(—)1_T9 (SEQ ID NO:69). Table 615 below describes thestarting and ending position of this segment on each transcript.

TABLE 615 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T6 (SEQ ID NO:67) 1 125 M62096_PEA_1_T9 (SEQ ID NO: 69) 1 125

Segment cluster M62096_PEA_(—)1_node_(—)23 (SEQ ID NO:621) according tothe present invention is supported by 36 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1_T6 (SEQ ID NO:67),M62096_PEA_(—)1_T7 (SEQ ID NO:68), M62096_PEA_(—)1_T9 (SEQ ID NO:69),M62096_PEA_(—)1_T13 (SEQ ID NO:71) and M62096_PEA_(—)1_T14 (SEQ IDNO:72). Table 616 below describes the starting and ending position ofthis segment on each transcript.

TABLE 616 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 1215 1363 M62096_PEA_1_T5 (SEQ ID NO: 66) 1197 1345 M62096_PEA_1_T6(SEQ ID NO: 67) 231 379 M62096_PEA_1_T7 (SEQ ID NO: 68) 253 401M62096_PEA_1_T9 (SEQ ID NO: 69) 231 379 M62096_PEA_1_T13 (SEQ ID NO: 71)1215 1363 M62096_PEA_1_T14 (SEQ ID NO: 72) 1215 1363

Segment cluster M62096_PEA_(—)1_node_(—)27 (SEQ ID NO:623) according tothe present invention is supported by 35 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1 T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096PEA_(—)1_T6 (SEQ ID NO:67),M62096_PEA_(—)1_T7 (SEQ ID NO:68), M62096_PEA_(—)1_T9 (SEQ ID NO:69),M62096_PEA_(—)1_T13 (SEQ ID NO:71) and M62096PEA_(—)1_T14 (SEQ IDNO:72). Table 617 below describes the starting and ending position ofthis segment on each transcript.

TABLE 617 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 1364 1512 M62096_PEA_1_T5 (SEQ ID NO: 66) 1407 1555 M62096_PEA_1_T6(SEQ ID NO: 67) 380 528 M62096_PEA_1_T7 (SEQ ID NO: 68) 402 550M62096_PEA_1_T9 (SEQ ID NO: 69) 441 589 M62096_PEA_1_T13 (SEQ ID NO: 71)1364 1512 M62096_PEA_1_T14 (SEQ ID NO: 72) 1364 1512

Segment cluster M62096_PEA_(—)1_node_(—)29 (SEQ ID NO:624) according tothe present invention is supported by 1 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65).Table 618 below describes the starting and ending position of thissegment on each transcript.

TABLE 618 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 1513 1679

Segment cluster M62096_PEA_(—)1_node_(—)31 (SEQ ID NO:625) according tothe present invention is supported by 24 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1_T6 (SEQ ID NO:67),M62096_PEA_(—)1_T7 (SEQ ID NO:68), M62096_PEA_(—)1_T9 (SEQ ID NO:69),M62096_PEA_(—)1_(—)113 (SEQ ID NO:71) and M62096_PEA_(—)1_T14 (SEQ IDNO:72). Table 619 below describes the starting and ending position ofthis segment on each transcript.

TABLE 619 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 1680 1855 M62096_PEA_1_T5 (SEQ ID NO: 66) 1556 1731 M62096_PEA_1_T6(SEQ ID NO: 67) 529 704 M62096_PEA_1_T7 (SEQ ID NO: 68) 551 726M62096_PEA_1_T9 (SEQ ID NO: 69) 590 765 M62096_PEA_1_T13 (SEQ ID NO: 71)1513 1688 M62096_PEA_1_T14 (SEQ ID NO: 72) 1513 1688

Segment cluster M62096_PEA_(—)1_node_(—)34 (SEQ ID NO:626) according tothe present invention is supported by 3 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T14 (SEQ IDNO:72). Table 620 below describes the starting and ending position ofthis segment on each transcript.

TABLE 620 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T14 (SEQ ID NO:72) 1758 2261

Segment cluster M62096_PEA_(—)1_node_(—)36 (SEQ ID NO:627) according tothe present invention is supported by 26 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1_T6 (SEQ ID NO:67),M62096_PEA_(—)1_T7 (SEQ ID NO:68), M62096_PEA_(—)1_T9 (SEQ ID NO:69) andM62096_PEA_(—)1_T13 (SEQ ID NO:71). Table 621 below describes thestarting and ending position of this segment on each transcript.

TABLE 621 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 1925 2131 M62096_PEA_1_T5 (SEQ ID NO: 66) 1801 2007 M62096_PEA_1_T6(SEQ ID NO: 67) 774 980 M62096_PEA_1_T7 (SEQ ID NO: 68) 796 1002M62096_PEA_1_T9 (SEQ ID NO: 69) 835 1041 M62096_PEA_1_T13 (SEQ ID NO:71) 1758 1964

Segment cluster M62096_PEA_(—)1_node_(—)38 (SEQ ID NO:628) according tothe present invention is supported by 24 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1_T6 (SEQ ID NO:67),M62096_PEA_(—)1_(—)17 (SEQ ID NO:68), M62096_PEA_(—)1_T9 (SEQ ID NO:69)and M62096_PEA_(—)1_T13 (SEQ ID NO:71). Table 622 below describes thestarting and ending position of this segment on each transcript.

TABLE 622 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 2132 2278 M62096_PEA_1_T5 (SEQ ID NO: 66) 2008 2154 M62096_PEA_1_T6(SEQ ID NO: 67) 981 1127 M62096_PEA_1_T7 (SEQ ID NO: 68) 1003 1149M62096_PEA_1_T9 (SEQ ID NO: 69) 1042 1188 M62096_PEA_1_T13 (SEQ ID NO:71) 1965 2111

Segment cluster M62096_PEA_(—)1_node_(—)40 (SEQ ID NO:629) according tothe present invention is supported by 21 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1_T6 (SEQ ID NO:67),M62096_PEA_(—)1_T7 (SEQ ID NO:68), M62096_PEA_(—)1_T9 (SEQ ID NO:69) andM62096_PEA_(—)1_T13 (SEQ ID NO:71). Table 623 below describes thestarting and ending position of this segment on each transcript.

TABLE 623 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 2279 2467 M62096_PEA_1_T5 (SEQ ID NO: 66) 2155 2343 M62096_PEA_1_T6(SEQ ID NO: 67) 1128 1316 M62096_PEA_1_T7 (SEQ ID NO: 68) 1150 1338M62096_PEA_1_T9 (SEQ ID NO: 69) 1189 1377 M62096_PEA_1_T13 (SEQ ID NO:71) 2112 2300

Segment cluster M62096_PEA_(—)1_node_(—)48 (SEQ ID NO:630) according tothe present invention is supported by 7 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T13 (SEQ IDNO:71). Table 624 below describes the starting and ending position ofthis segment on each transcript.

TABLE 624 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T13 (SEQ ID NO:71) 2606 2945

Segment cluster M62096_PEA_(—)1_node_(—)50 (SEQ ID NO:631) according tothe present invention is supported by 3 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T11 (SEQ ID NO:70)and M62096_PEA_(—)1_T15 (SEQ ID NO:73). Table 625 below describes thestarting and ending position of this segment on each transcript.

TABLE 625 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T11 (SEQ ID NO:70) 1 688 M62096_PEA_1_T15 (SEQ ID NO: 73) 1 688

Segment cluster M62096_PEA_(—)1_node_(—)56 (SEQ ID NO:632) according tothe present invention is supported by 1 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_(—)115 (SEQ IDNO:73). Table 626 below describes the starting and ending position ofthis segment on each transcript.

TABLE 626 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T15 (SEQ ID NO:73) 924 1059

Segment cluster M62096_PEA_(—)1_node_(—)60 (SEQ ID NO:633) according tothe present invention is supported by 13 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1_T11 (SEQ ID NO:67),M62096_PEA_(—)1_T7 (SEQ ID NO:68), M62095_PEA_(—)1_T9 (SEQ ID NO:69) andM62096_PEA_(—)1_T11 (SEQ ID NO:70). Table 627 below describes thestarting and ending position of this segment on each transcript.

TABLE 627 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 3113 3329 M62096_PEA_1_T5 (SEQ ID NO: 66) 2989 3205 M62096_PEA_1_T6(SEQ ID NO: 67) 1962 2178 M62096_PEA_1_T7 (SEQ ID NO: 68) 1984 2200M62096_PEA_1_T9 (SEQ ID NO: 69) 2023 2239 M62096_PEA_1_T11 (SEQ ID NO:70) 1029 1245

Segment cluster M62096_PEA_(—)1_node_(—)65 (SEQ ID NO:634) according tothe present invention is supported by 51 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA1_(—)16 (SEQ ID NO:67),M62096_PEA_(—)1_T7 (SEQ ID NO:68), M62096_PEA_(—)1_T9 (SEQ ID NO:69) andM62096_PEA_(—)1_T11 (SEQ ID NO:70). Table 628 below describes thestarting and ending position of this segment on each transcript.

TABLE 628 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 3444 4763 M62096_PEA_1_T5 (SEQ ID NO: 66) 3320 4639 M62096_PEA_1_T6(SEQ ID NO: 67) 2293 3612 M62096_PEA_1_T7 (SEQ ID NO: 68) 2315 3634M62096_PEA_1_T9 (SEQ ID NO: 69) 2354 3673 M62096_PEA_1_T11 (SEQ ID NO:70) 1360 2679

Segment cluster M62096_PEA_(—)1_node_(—)69 (SEQ ID NO:635) according tothe present invention is supported by 85 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1_T6 (SEQ ID NO:67),M62096_PEA_(—)1_T7 (SEQ ID NO:68), M62096_PEA_(—)1_T9 (SEQ ID NO:69) andM62096_PEA_(—)1_T11 (SEQ ID NO:70). Table 629 below describes thestarting and ending position of this segment on each transcript.

TABLE 629 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 4894 5826 M62096_PEA_1_T5 (SEQ ID NO: 66) 4770 5702 M62096_PEA_1_T6(SEQ ID NO: 67) 3743 4675 M62096_PEA_1_T7 (SEQ ID NO: 68) 3765 4697M62096_PEA_1_T9 (SEQ ID NO: 69) 3804 4736 M62096_PEA_1_T11 (SEQ ID NO:70) 2810 3742

Segment cluster M62096_PEA_(—)1_node_(—)71 (SEQ ID NO:636) according tome present invention is supported by 178 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1T6 (SEQ ID NO:67),M62096_PEA_(—)1_T7 (SEQ ID NO:68), M62096_PEA_(—)1_T9 (SEQ ID NO:69) andM62096_PEA_(—)1_T11 (SEQ ID NO:70). Table 630 below describes thestarting and ending position of this segment on each transcript.

TABLE 630 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 5882 7128 M62096_PEA_1_T5 (SEQ ID NO: 66) 5758 7004 M62096_PEA_1_T6(SEQ ID NO: 67) 4731 5977 M62096_PEA_1_T7 (SEQ ID NO: 68) 4753 5999M62096_PEA_1_T9 (SEQ ID NO: 69) 4792 6038 M62096_PEA_1_T11 (SEQ ID NO:70) 3798 5044

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 bp in length, and so are included in a separatedescription.

Segment cluster M62096_PEA_(—)1_node_(—)1 (SEQ ID NO:637) according tothe present invention can be found in the following transcript(s):M62096_PEA_(—)1_T4 (SEQ ID NO:65), M62096_PEA_(—)1_T13 (SEQ ID NO:71)and M62096_PEA_(—)1_T14 (SEQ ID NO:72). Table 631 below describes thestarting and ending position of this segment on each transcript.

TABLE 631 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 356 373 M62096_PEA_1_T13 (SEQ ID NO: 71) 356 373 M62096_PEA_1_T14(SEQ ID NO: 72) 356 373

Segment cluster M62096_PEA_(—)1_node_(—)4 (SEQ ID NO:638) according tothe present invention is supported by 12 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1_T13 (SEQ ID NO:71)and M62096_PEA_(—)1_T14 (SEQ ID NO:72). Table 632 below describes thestarting and ending position of this segment on each transcript.

TABLE 632 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 522 612 M62096_PEA_1_T5 (SEQ ID NO: 66) 504 594 M62096_PEA_1_T13(SEQ ID NO: 71) 522 612 M62096_PEA_1_T14 (SEQ ID NO: 72) 522 612

Segment cluster M62096_PEA_(—)1_node_(—)6 (SEQ ID NO:639) according tothe present invention is supported by 13 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1_T13 (SEQ ID NO:71)and M62096_PEA_(—)1_T14 (SEQ ID NO:72). Table 633 below describes thestarting and ending position of this segment on each transcript.

TABLE 633 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 613 686 M62096_PEA_1_T5 (SEQ ID NO: 66) 595 668 M62096_PEA_1_T13(SEQ ID NO: 71) 613 686 M62096_PEA_1_T14 (SEQ ID NO: 72) 613 686

Segment cluster M62096_PEA_(—)1_node_(—)7 (SEQ ID NO:640) according tothe present invention is supported by 19 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1_T13 (SEQ ID NO:71)and M62096_PEA_(—)1_T14 (SEQ ID NO:72). Table 634 below describes thestarting and ending position of this segment on each transcript.

TABLE 634 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 687 791 M62096_PEA_1_T5 (SEQ ID NO: 66) 669 773 M62096_PEA_1_T13(SEQ ID NO: 71) 687 791 M62096_PEA_1_T14 (SEQ ID NO: 72) 687 791

Segment cluster M62096_PEA_(—)1_node_(—)9 (SEQ ID NO:641) according tothe present invention is supported by 18 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1_T13 (SEQ ID NO:71)and M62096_PEA_(—)1_T14 (SEQ ID NO:72). Table 635 below describes thestarting and ending position of this segment on each transcript.

TABLE 635 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 792 840 M62096_PEA_1_T5 (SEQ ID NO: 66) 774 822 M62096_PEA_1_T13(SEQ ID NO: 71) 792 840 M62096_PEA_1_T14 (SEQ ID NO: 72) 792 840

Segment cluster M62096_PEA_(—)1_node_(—)11 (SEQ ID NO:642) according tothe present invention is supported by 22 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1_T13 (SEQ ID NO:71)and M62096_PEA_(—)1_T14 (SEQ ID NO:72). Table 636 below describes thestarting and ending position of this segment on each transcript.

TABLE 636 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 841 896 M62096_PEA_1_T5 (SEQ ID NO: 66) 823 878 M62096_PEA_1_T13(SEQ ID NO: 71) 841 896 M62096_PEA_1_T14 (SEQ ID NO: 72) 841 896

Segment cluster M62096_PEA_(—)1_node_(—)13 (SEQ ID NO:643) according tothe present invention is supported by 24 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1_T13 (SEQ ID NO:71)and M62096_PEA_(—)1_T14 (SEQ ID NO:72). Table 637 below describes thestarting and ending position of this segment on each transcript.

TABLE 637 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 897 984 M62096_PEA_1_T5 (SEQ ID NO: 66) 879 966 M62096_PEA_1_T13(SEQ ID NO: 71) 897 984 M62096_PEA_1_T14 (SEQ ID NO: 72) 897 984

Segment cluster M62096_PEA_(—)1_node_(—)21 (SEQ ID NO:644) according tothe present invention is supported by 33 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1_T6 (SEQ ID NO:67),M62096_PEA_(—)1_T7 (SEQ ID NO:68), M62096_PEA_(—)1_T9 (SEQ ID NO:69),M62096_PEA_(—)1_T13 (SEQ ID NO:71) and M62096_PEA_(—)1_T14 (SEQ IDNO:72). Table 638 below describes the starting and ending position ofthis segment on each transcript.

TABLE 638 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 1110 1214 M62096_PEA_1_T5 (SEQ ID NO: 66) 1092 1196 M62096_PEA_1_T6(SEQ ID NO: 67) 126 230 M62096_PEA_1_T7 (SEQ ID NO: 68) 148 252M62096_PEA_1_T9 (SEQ ID NO: 69) 126 230 M62096_PEA_1_T13 (SEQ ID NO: 71)1110 1214 M62096_PEA_1_T14 (SEQ ID NO: 72) 1110 1214

Segment cluster M62096_PEA_(—)1_node_(—)25 (SEQ ID NO:645) according tothe present invention is supported by 3 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T5 (SEQ ID NO:66)and M62096_PEA_(—)1_T9 (SEQ ID NO:69). Table 639 below describes thestarting and ending position of this segment on each transcript.

TABLE 639 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T5 (SEQ ID NO:66) 1346 1406 M62096_PEA_1_T9 (SEQ ID NO: 69) 380 440

Segment cluster M62096_PEA_(—)1_node_(—)33 (SEQ ID NO:645) according tothe present invention is supported by 20 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T9 (SEQ ID NO:66), M62096_PEA_(—)1_T6 (SEQ ID NO:67),M62096_PEA_(—)1_T7 (SEQ ID NO:68), M62096_PEA_(—)1_T9 (SEQ ID NO:69),M62096_PEA_(—)1_T13 (SEQ ID NO:71) and M62096_PEA1_T14 (SEQ ID NO:72).Table 640 below describes the starting and ending position of thissegment on each transcript.

TABLE 640 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 1856 1924 M62096_PEA_1_T5 (SEQ ID NO: 66) 1732 1800 M62096_PEA_1_T6(SEQ ID NO: 67) 705 773 M62096_PEA_1_T7 (SEQ ID NO: 68) 727 795M62096_PEA_1_T9 (SEQ ID NO: 69) 766 834 M62096_PEA_1_T13 (SEQ ID NO: 71)1689 1757 M62096_PEA_1_T14 (SEQ ID NO: 72) 1689 1757

Segment cluster M62096_PEA_(—)1_node_(—)42 (SEQ ID NO:647) according tothe present invention is supported by 17 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1_T6 (SEQ ID NO:67),M62096_PEA_(—)1_T7 (SEQ ID 68), M62096_PEA_(—)1_T9 (SEQ ID NO:69) andM62096_PEA_(—)1_T13 (SEQ ID NO:71). Table 641 below describes thestarting and ending position of this segment on each transcript.

TABLE 641 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 2468 2585 M62096_PEA_1_T5 (SEQ ID NO: 66) 2344 2461 M62096_PEA_1_T6(SEQ ID NO: 67) 1317 1434 M62096_PEA_1_T7 (SEQ ID NO: 68) 1339 1456M62096_PEA_1_T9 (SEQ ID NO: 69) 1378 1495 M62096_PEA_1_T13 (SEQ ID NO:71) 2301 2418

Segment cluster M62096_PEA_(—)1_node_(—)44 (SEQ ID NO:648) according tothe present invention is supported by 19 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1_T6 (SEQ ID NO:67),M62096_PEA_(—)1_T7 (SEQ ID NO:68), M62096_PEA_(—)1_T9 (SEQ ID NO:69) andM62096_PEA_(—)1_T13 (SEQ ID NO:71). Table 642 below describes thestarting and ending position of this segment on each transcript.

TABLE 642 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 2586 2662 M62096_PEA_1_T5 (SEQ ID NO: 66) 2462 2538 M62096_PEA_1_T6(SEQ ID NO: 67) 1435 1511 M62096_PEA_1_T7 (SEQ ID NO: 68) 1457 1533M62096_PEA_1_T9 (SEQ ID NO: 69) 1496 1572 M62096_PEA_1_T13 (SEQ ID NO:71) 2419 2495

Segment cluster M62096_PEA_(—)1_node_(—)47 (SEQ ID NO:649) according tothe present invention is supported by 21 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1_T6 (SEQ ID NO:67),M62096_PEA_(—)1_T7 (SEQ ID NO:68), M62096_PEA_(—)1_T9 (SEQ ID NO:69) andM62096_PEA_(—)1_T13 (SEQ ID NO:71). Table 643 below describes thestarting and ending position of this segment on each transcript.

TABLE 643 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 2663 2772 M62096_PEA_1_T5 (SEQ ID NO: 66) 2539 2648 M62096_PEA_1_T6(SEQ ID NO: 67) 1512 1621 M62096_PEA_1_T7 (SEQ ID NO: 68) 1534 1643M62096_PEA_1_T9 (SEQ ID NO: 69) 1573 1682 M62096_PEA_1_T13 (SEQ ID NO:71) 2496 2605

Microarray (chip) data is also available for this segment as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotides were found to hit this segment (in relation to lungcancer), shown in Table 644.

TABLE 644 Oligonucleotides related to this segment Overexpressed ChipOligonucleotide name in cancers reference M62096_0_7_0 (SEQ ID NO: 231)lung malignant tumors LUN

Segment cluster M62096_PEA_(—)1_node_(—)51 (SEQ ID NO:650) according tothe present invention is supported by 11 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096PEA_(—)1_T6 (SEQ ID NO:67),M62096_PEA1_T7 (SEQ ID NO:68), M62096_PEA_(—)1_T9 (SEQ ID NO:69),M62096_PEA_(—)1_T11 (SEQ ID NO:70) and M62096_PEA_(—)1_T15 (SEQ IDNO:73). Table 645 below describes the starting and ending position ofthis segment on each transcript.

TABLE 645 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 2773 2874 M62096_PEA_1_T5 (SEQ ID NO: 66) 2649 2750 M62096_PEA_1_T6(SEQ ID NO: 67) 1622 1723 M62096_PEA_1_T7 (SEQ ID NO: 68) 1644 1745M62096_PEA_1_T9 (SEQ ID NO: 69) 1683 1784 M62096_PEA_1_T11 (SEQ ID NO:70) 689 790 M62096_PEA_1_T15 (SEQ ID NO: 73) 689 790

Segment cluster M62096_PEA_(—)1_node_(—)53 (SEQ ID NO:651) according tothe present invention is supported by 10 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1_T6 (SEQ ID NO:67),M62096_PEA_(—)1_T7 (SEQ ID NO:68), M62096_PEA_(—)1_T9 (SEQ ID NO:69),M62096PEA_(—)1_T11 (SEQ ID NO:70) and M62096PEA_(—)1_T15 (SEQ ID NO:73).Table 646 below describes the starting and ending position of thissegment on each transcript.

TABLE 646 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 2875 2935 M62096_PEA_1_T5 (SEQ ID NO: 66) 2751 2811 M62096_PEA_1_T6(SEQ ID NO: 67) 1724 1784 M62096_PEA_1_T7 (SEQ ID NO: 68) 1746 1806M62096_PEA_1_T9 (SEQ ID NO: 69) 1785 1845 M62096_PEA_1_T11 (SEQ ID NO:70) 791 851 M62096_PEA_1_T15 (SEQ ID NO: 73) 791 851

Segment cluster M62096_PEA_(—)1_node_(—)55 (SEQ ID NO:652) according tothe present invention is supported by 9 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T9 (SEQ ID NO:66), M62096_PEA_(—)1_T6 (SEQ ID NO:67),M62096_PEA _(—)1_T7 (SEQ ID NO:68), M62096_PEA_(—)1_T9 (SEQ ID NO:69),M62096_PEA_(—)1_T11 (SEQ ID NO:70) and M62096PEA_(—)1_T15 (SEQ IDNO:73). Table 647 below describes the starting and ending position ofthis segment on each transcript.

TABLE 647 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 2936 3007 M62096_PEA_1_T5 (SEQ ID NO: 66) 2812 2883 M62096_PEA_1_T6(SEQ ID NO: 67) 1785 1856 M62096_PEA_1_T7 (SEQ ID NO: 68) 1807 1878M62096_PEA_1_T9 (SEQ ID NO: 69) 1846 1917 M62096_PEA_1_T11 (SEQ ID NO:70) 852 923 M62096_PEA_1_T15 (SEQ ID NO: 73) 852 923

Segment cluster M62096_PEA_(—)1_node_(—)58 (SEQ ID NO:653) according tothe present invention is supported by 9 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1_T6 (SEQ ID NO:67),M62096_PEA_(—)1_T7 (SEQ ID NO:68), M62096_PEA_(—)1_T9 (SEQ ID NO:69) andM62096_PEA_(—)1_T11 (SEQ ID NO:70). Table 648 below describes thestarting and ending position of this segment on each transcript.

TABLE 648 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 3008 3112 M62096_PEA_1_T5 (SEQ ID NO: 66) 2884 2988 M62096_PEA_1_T6(SEQ ID NO: 67) 1857 1961 M62096_PEA_1_T7 (SEQ ID NO: 68) 1879 1983M62096_PEA_1_T9 (SEQ ID NO: 69) 1918 2022 M62096_PEA_1_T11 (SEQ ID NO:70) 924 1028

Segment cluster M62096_PEA_(—)1_node_(—)62 (SEQ ID NO:654) according tothe present invention is supported by 14 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096PEA_(—)1_T6 (SEQ ID NO:67),M62096_PEA_(—)1_T7 (SEQ ID NO:68), M62096_PEA_(—)1_T9 (SEQ ID NO:69) andM62096_PEA_(—)1_T11 (SEQ ID NO:70). Table 649 below describes thestarting and ending position of this segment on each transcript.

TABLE 649 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 3330 3443 M62096_PEA_1_T5 (SEQ ID NO: 66) 3206 3319 M62096_PEA_1_T6(SEQ ID NO: 67) 2179 2292 M62096_PEA_1_T7 (SEQ ID NO: 68) 2201 2314M62096_PEA_1_T9 (SEQ ID NO: 69) 2240 2353 M62096_PEA_1_T11 (SEQ ID NO:70) 1246 1359

Segment cluster M62096_PEA_(—)1_node_(—)66 (SEQ ID NO:655) according tothe present invention is supported by 23 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1_T6 (SEQ ID NO:67),M62096_PEA_(—)1_T7 (SEQ ID NO:68), M62096_PEA_(—)1_T9 (SEQ ID NO:69) andM62096_PEA_(—)1_T11 (SEQ ID NO:70). Table 650 below describes thestarting and ending position of this segment on each transcript.

TABLE 650 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 4764 4881 M62096_PEA_1_T5 (SEQ ID NO: 66) 4640 4757 M62096_PEA_1_T6(SEQ ID NO: 67) 3613 3730 M62096_PEA_1_T7 (SEQ ID NO: 68) 3635 3752M62096_PEA_1_T9 (SEQ ID NO: 69) 3674 3791 M62096_PEA_1_T11 (SEQ ID NO:70) 2680 2797

Segment cluster M62096_PEA_(—)1_node_(—)67 (SEQ ID NO:656) according tothe present invention can be found in the following transcript(s):M62096_PEA_(—)1_T4 (SEQ ID NO:65), M62096_PEA_(—)1_T5 (SEQ ID NO:66),M62096_PEA_(—)1_T6 (SEQ ID NO:67), M62096_PEA_(—)1_T7 (SEQ ID NO:68),M62096_PEA_(—)1_T9 (SEQ ID NO:69) and M62096_PEA_(—)1_T11 (SEQ IDNO:70). Table 651 below describes the starting and ending position ofthis segment on each transcript.

TABLE 651 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 4882 4887 M62096_PEA_1_T5 (SEQ ID NO: 66) 4758 4763 M62096_PEA_1_T6(SEQ ID NO: 67) 3731 3736 M62096_PEA_1_T7 (SEQ ID NO: 68) 3753 3758M62096_PEA_1_T9 (SEQ ID NO: 69) 3792 3797 M62096_PEA_1_T11 (SEQ ID NO:70) 2798 2803

Segment cluster M62096_PEA_(—)1_node_(—)68 (SEQ ID NO:657) according tothe present invention can be found in the following transcript(s):M62096_PEA_(—)1_T4 (SEQ ID NO:65), M62096_PEA_(—)1_T5 (SEQ ID NO:66),M62096_PEA_(—)1_T6 (SEQ ID NO:67), M62096_PEA_(—)1_T7 (SEQ ID NO:68),M62096_PEA_(—)1_T9 (SEQ ID NO:69) and M62096_PEA_(—)1_T11 (SEQ IDNO:70). Table 652 below describes the starting and ending position ofthis segment on each transcript.

TABLE 652 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 4888 4893 M62096_PEA_1_T5 (SEQ ID NO: 66) 4764 4769 M62096_PEA_1_T6(SEQ ID NO: 67) 3737 3742 M62096_PEA_1_T7 (SEQ ID NO: 68) 3759 3764M62096_PEA_1_T9 (SEQ ID NO: 69) 3798 3803 M62096_PEA_1_T11 (SEQ ID NO:70) 2804 2809

Segment cluster M62096_PEA_(—)1_node_(—)70 (SEQ ID NO:658) according tothe present invention is supported by 55 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M62096_PEA_(—)1_T4 (SEQ ID NO:65),M62096_PEA_(—)1_T5 (SEQ ID NO:66), M62096_PEA_(—)1_T6 (SEQ ID NO:67),M62096_PEA_(—)1T7 (SEQ ID NO:68), M62096_PEA_(—)1_T9 (SEQ ID NO:69) andM62096_PEA_(—)1_T11 (SEQ ID NO:70). Table 653 below describes thestarting and ending position of this segment on each transcript.

TABLE 653 Segment location on transcripts Segment Segment endingTranscript name starting position position M62096_PEA_1_T4 (SEQ ID NO:65) 5827 5881 M62096_PEA_1_T5 (SEQ ID NO: 66) 5703 5757 M62096_PEA_1_T6(SEQ ID NO: 67) 4676 4730 M62096_PEA_1_T7 (SEQ ID NO: 68) 4698 4752M62096_PEA_1_T9 (SEQ ID NO: 69) 4737 4791 M62096_PEA_1_T11 (SEQ ID NO:70) 3743 3797Variant protein alignment to the previously known protein:

Sequence name: KF5C_HUMAN (SEQ ID NO: 1438) Sequence documentation:Alignment of: M62096_PEA_1_P4 (SEQ ID NO: 1341) × KF5C_HUMAN (SEQ ID NO:1438) Alignment segment 1/1: Quality: 6936.00 Escore: 0 Matching length:719 Total length: 719 Matching Percent Similarity: 100.00 MatchingPercent Identity: 100.00 Total Percent Similarity: 100.00 Total PercentIdentity: 100.00 Gaps: 0 Alignment:

Sequence name: KF5C_HUMAN (SEQ ID NO: 1438) Sequence documentation:Alignment of: M62096_PEA_1_P5 (SEQ ID NO: 1342) × KF5C_HUMAN (SEQ ID NO:1438) Alignment segment 1/1: Quality: 6520.00 Escore: 0 Matching length:674 Total length: 674 Matching Percent Similarity: 100.00 MatchingPercent Identity: 100.00 Total Percent Similarity: 100.00 Total PercentIdentity: 100.00 Gaps: 0 Alignment:

Sequence name: KF5C_HUMAN (SEQ ID NO: 1438) Sequence documentation:Alignment of: M62096_PEA_1_P3 (SEQ ID NO: 1343) × KF5C_HUMAN (SEQ ID NO:1438) Alignment segment 1/1: Quality: 5726.00 Escore: 0 Matching length:593 Total length: 593 Matching Percent Similarity: 100.00 MatchingPercent Identity: 100.00 Total Percent Similarity: 100.00 Total PercentIdentity: 100.00 Gaps: 0 Alignment:

Sequence name: KF5C_HUMAN (SEQ ID NO: 1438) Sequence documentation:Alignment of: M62096_PEA_1_P7 (SEQ ID NO: 1344) × KF5C_HUMAN (SEQ ID NO:1438) Alignment segment 1/1: Quality: 2117.00 Escore: 0 Matching length:220 Total length: 220 Matching Percent Similarity: 100.00 MatchingPercent Identity: 100.00 Total Percent Similarity: 100.00 Total PercentIdentity: 100.00 Gaps: 0 Alignment:

Sequence name: KF5C_HUMAN (SEQ ID NO: 1438) Sequence documentation:Alignment of: M62096_PEA_1_P8 (SEQ ID NO: 1345) × KF5C_HUMAN (SEQ ID NO:1438) Alignment segment 1/1: Quality: 7146.00 Escore: 0 Matching length:737 Total length: 737 Matching Percent Similarity: 100.00 MatchingPercent Identity: 99.86 Total Percent Similarity: 100.00 Total PercentIdentity: 99.86 Gaps: 0 Alignment:

Sequence name: KF5C_HUMAN (SEQ ID NO: 1438) Sequence documentation:Alignment of: M62096_PEA_1_P9 (SEQ ID NO: 1346) × KF5C_HUMAN (SEQ ID NO:1438) Alignment segment 1/1: Quality: 4434.00 Escore: 0 Matching length:454 Total length: 454 Matching Percent Similarity: 100.00 MatchingPercent Identity: 100.00 Total Percent Similarity: 100.00 Total PercentIdentity: 100.00 Gaps: 0 Alignment:

Sequence name: KF5C_HUMAN (SEQ ID NO: 1438) Sequence documentation:Alignment of: M62096_PEA_1_P10 (SEQ ID NO: 1347) × KF5C_HUMAN (SEQ IDNO: 1438) Alignment segment 1/1: Quality: 747.00 Escore: 0 Matchinglength: 78 Total length: 78 Matching Percent Similarity: 100.00 MatchingPercent Identity: 100.00 Total Percent Similarity: 100.00 Total PercentIdentity: 100.00 Gaps: 0 Alignment:

Sequence name: KF5C_HUMAN (SEQ ID NO: 1438) Sequence documentation:Alignment of: M62096_PEA_1_P11 (SEQ ID NO: 1348) × KF5C_HUMAN (SEQ IDNO: 1438) . . . Alignment segment 1/1: Quality: 3634.00 Escore: 0Matching length: 372 Total length: 372 Matching Percent Similarity:100.00 Matching Percent Identity: 100.00 Total Percent Similarity:100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:

Sequence name: KF5C_HUMAN (SEQ ID NO: 1438) Sequence documentation:Alignment of: M62096_PEA_1_P12 (SEQ ID NO: 1349) × KF5C_HUMAN (SEQ IDNO: 1438) . . . Alignment segment 1/1: Quality: 3145.00 Escore: 0Matching length: 323 Total length: 323 Matching Percent Similarity:100.00 Matching Percent Identity: 100.00 Total Percent Similarity:100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:

Expression of Homo sapiens Protein Tyrosine Phosphatase, Receptor Type,S (PTPRS) M62069 Transcripts Which are Detectable by Amplicon asDepicted in Sequence Name M62069 seg19 (SEQ ID NO: 1657) in Normal andCancerous Lung Tissues

Expression of Homo sapiens protein tyrosine phosphatase, receptor type,S (PTPRS) transcripts detectable by or according to seg19, M62069 seg19amplicon (SEQ ID NO: 1657) and M62069 seg19F (SEQ ID NO: 1655) andM62069 seg19R (SEQ ID NO: 1656) primers was measured by real time PCR.In parallel the expression of four housekeeping genes—PBGD (GenBankAccession No. BC019323 (SEQ ID NO:1713); amplicon—PBGD-amplicon, SEQ IDNO:334), HPRT1 (GenBank Accession No. NM_(—)000194 (SEQ ID NO:1714);amplicon—HPRT1-amplicon, SEQ ID NO:1297), Ubiquitin (GenBank AccessionNo. BC000449 (SEQ ID NO:1711); amplicon—Ubiquitin-amplicon, SEQ IDNO:328) and SDHA (GenBank Accession No. NM_(—)004168 (SEQ ID NO:1712);amplicon—SDHA-amplicon, SEQ ID NO:331), was measured similarly. For eachRT sample, the expression of the above amplicon was normalized to thegeometric mean of the quantities of the housekeeping genes. Thenormalized quantity of each RT sample was then divided by the median ofthe quantities of the normal post-mortem (PM) samples (Sample Nos.47-50, 90-93, 96-99, Table 2, above), to obtain a value of foldup-regulation for each sample relative to median of the normal PMsamples.

FIG. 65 is a histogram showing over expression of the above-indicatedHomo sapiens protein tyrosine phosphatase, receptor type, S (PTPRS)transcripts in cancerous lung samples relative to the normal samples.Values represent the average of duplicate experiments. Error barsindicate the minimal and maximal values obtained.

As is evident from FIG. 65, the expression of Homo sapiens proteintyrosine phosphatase, receptor type, S (PTPRS) transcripts detectable bythe above amplicon(s) in cancer samples was significantly higher than inthe non-cancerous samples (Sample Nos. 47-50, 90-93, 96-99 Table 2).Notably an over-expression of at least 5 fold was found in 2 out of 15adenocarcinoma samples, and in 8 out of 8 small cells carcinoma samples.

Primer pairs are also optionally and preferably encompassed within thepresent invention; for example, for the above experiment, the followingprimer pair was used as a non-limiting illustrative example only of asuitable primer pair: M62069 seg19F forward primer (SEQ ID NO: 1655);and M62069 seg19R reverse primer (SEQ ID NO: 1656).

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: M62069 seg19 (SEQ IDNO: 1657).

Forward primer- M62069 seg19F (SEQ ID NO: 1655): GCTGATTGTCCCCATGAAGGReverse primer- M62069 seg19 (SEQ ID NO: 1656): TGGCATACGGGAACTCAGTGAmplicon (SEQ ID NO: 1657):GCTGATTGTCCCCATGAAGGGCAGCCTTGAAGCTTGGTCAGTCTCCCTAACTGTATGATTGATCCCCACTTATTGCACTACATCACTGAGTTCCCGTATGC

Expression of Homo sapiens Protein Tyrosine Phosphatase, Receptor Type,S (PTPRS) M62069 Transcripts Which are Detectable by Amplicon asDepicted in Sequence Name M62069 seg29 (SEQ ID NO: 1660) in Normal andCancerous Lung Tissues

Expression of Homo sapiens protein tyrosine phosphatase, receptor type,S (PTPRS) transcripts detectable by or according to seg29, M62069 seg29amplicon (SEQ ID NO: 1660) and M62069 seg29F (SEQ ID NO: 1658) andM62069 seg29R (SEQ ID NO: 1659) primers was measured by real time PCR.In parallel the expression of four housekeeping genes—PBGD (GenBankAccession No. BC019323 (SEQ ID NO:1713); amplicon—PBGD-amplicon, SEQ IDNO:334), HPRT1 (GenBank Accession No. NM_(—)000194 (SEQ ID NO:1714);amplicon—HPRT1-amplicon, SEQ ID NO:1297), Ubiquitin (GenBank AccessionNo. BC000449 (SEQ ID NO:1711); amplicon—Ubiquitin-amplicon, SEQ IDNO:328) and SDHA (GenBank Accession No. NM_(—)004168 (SEQ ID NO:1712);amplicon—SDHA-amplicon, SEQ ID NO:331), was measured similarly. For eachRT sample, the expression of the above amplicon was normalized to thegeometric mean of the quantities of the housekeeping genes. Thenormalized quantity of each RT sample was then divided by the median ofthe quantities of the normal post-mortem (PM) samples (Sample Nos.47-50, 90-93, 96-99, Table 2, above), to obtain a value of foldup-regulation for each sample relative to median of the normal PMsamples.

FIG. 66 is a histogram showing over expression of the above-indicatedHomo sapiens protein tyrosine phosphatase, receptor type, S (PTPRS)transcripts in cancerous lung samples relative to the normal samples.Values represent the average of duplicate experiments. Error barsindicate the minimal and maximal values obtained.

As is evident from FIG. 66, the expression of Homo sapiens proteintyrosine phosphatase, receptor type, S (PTPRS) transcripts detectable bythe above amplicon(s) in cancer samples was significantly higher than inthe non-cancerous samples (Sample Nos. 47-50, 90-93, 96-99 Table 2).Notably an over-expression of at least 5 fold was found in 2 out of 15adenocarcinoma samples, and in 7 out of 8 small cells carcinoma samples.

Primer pairs are also optionally and preferably encompassed within thepresent invention; for example, for the above experiment, the followingprimer pair was used as a non-limiting illustrative example only of asuitable primer pair: M62069 seg29F forward primer (SEQ ID NO: 1658);and M62069 seg29R reverse primer (SEQ ID NO: 1659).

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: M62069 seg29 (SEQ IDNO: 1660).

Forward primer- M62069 seg29F: ATTGAATAATTCAGCACCTGAGGC Reverse primer-M62069 seg29R: TTCATATGGCTACTCCCCACCT Amplicon:ATTGAATAATTCAGCACCTGAGGCTGGTGGATGATTCTTTGCAATTTGGCAGGAATGGGAGAGTCGGGAGCAGTAGTTGGCAAGGTGGGGAGTAGCCATA TGAA

Description for Cluster M78076

Cluster M78076 features 9 transcript(s) and 35 segment(s) of interest,the names for which are given in Tables 654 and 655, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 656.

TABLE 654 Transcripts of interest Transcript Name Sequence ID No.M78076_PEA_1_T2 74 M78076_PEA_1_T3 75 M78076_PEA_1_T5 76M78076_PEA_1_T13 77 M78076_PEA_1_T15 78 M78076_PEA_1_T23 79M78076_PEA_1_T26 80 M78076_PEA_1_T27 81 M78076_PEA_1_T28 82

TABLE 655 Segments of interest Segment Name Sequence ID No.M78076_PEA_1_node_0 659 M78076_PEA_1_node_10 660 M78076_PEA_1_node_15661 M78076_PEA_1_node_18 662 M78076_PEA_1_node_20 663M78076_PEA_1_node_24 664 M78076_PEA_1_node_26 665 M78076_PEA_1_node_29666 M78076_PEA_1_node_32 667 M78076_PEA_1_node_35 668M78076_PEA_1_node_37 669 M78076_PEA_1_node_46 670 M78076_PEA_1_node_47671 M78076_PEA_1_node_54 672 M78076_PEA_1_node_1 673 M78076_PEA_1_node_2674 M78076_PEA_1_node_3 675 M78076_PEA_1_node_6 676 M78076_PEA_1_node_7677 M78076_PEA_1_node_12 678 M78076_PEA_1_node_22 679M78076_PEA_1_node_27 680 M78076_PEA_1_node_30 681 M78076_PEA_1_node_31682 M78076_PEA_1_node_34 683 M78076_PEA_1_node_36 684M78076_PEA_1_node_41 685 M78076_PEA_1_node_42 686 M78076_PEA_1_node_43687 M78076_PEA_1_node_45 688 M78076_PEA_1_node_49 689M78076_PEA_1_node_50 690 M78076_PEA_1_node_51 691 M78076_PEA_1_node_52692 M78076_PEA_1_node_53 693

TABLE 656 Proteins of interest Protein Name Sequence ID No.Corresponding Transcript(s) M78076_PEA_1_P3 1350 M78076_PEA_1_T2 (SEQ IDNO: 74); M78076_PEA_1_T5 (SEQ ID NO: 76) M78076_PEA_1_P4 1351M78076_PEA_1_T3 (SEQ ID NO: 75) M78076_PEA_1_P12 1352 M78076_PEA_1_T13(SEQ ID NO: 77) M78076_PEA_1_P14 1353 M78076_PEA_1_T15 (SEQ ID NO: 78)M78076_PEA_1_P21 1354 M78076_PEA_1_T23 (SEQ ID NO: 79) M78076_PEA_1_P241355 M78076_PEA_1_T26 (SEQ ID NO: 80) M78076_PEA_1_P2 1356M78076_PEA_1_T27 (SEQ ID NO: 81) M78076_PEA_1_P25 1357 M78076_PEA_1_T28(SEQ ID NO: 82)

These sequences are variants of the known protein Amyloid-like protein 1precursor (SwissProt accession identifier APP1_HUMAN; known alsoaccording to the synonyms APLP; APLP-1), SEQ ID NO: 1439, referred toherein as the previously known protein.

Protein Amyloid-like protein 1 precursor (SEQ ID NO:1439) is known orbelieved to have the following function(s): May play a role inpostsynaptic function. The C-terminal gamma-secretase processedfragment, ALID1, activates transcription activation through APBB1 (Fe65)binding (By similarity). Couples to JIP signal transduction throughC-terminal binding. May interact with cellular G-protein signalingpathways. Can regulate neurite outgrowth through binding to componentsof the extracellular matrix such as heparin and collagen I. Thegamma-CTF peptide, C30, is a potent enhancer of neuronal apoptosis (Bysimilarity). The sequence for protein Amyloid-like protein 1 precursoris given at the end of the application, as “Amyloid-like protein 1precursor amino acid sequence”. Known polymorphisms for this sequenceare as shown in Table 657.

TABLE 657 Amino acid mutations for Known Protein SNP position(s) onamino acid sequence Comment 48 A -> P

Protein Amyloid-like protein 1 precursor (SEQ ID NO:1439) localizationis believed to be Type I membrane protein. C-terminally processed in theGolgi complex.

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: endocytosis; apoptosis; celladhesion; neurogenesis; cell death, which are annotation(s) related toBiological Process; protein binding; heparin binding, which areannotation(s) related to Molecular Function; and basement membrane;coated pit; integral membrane protein, which are annotation(s) relatedto Cellular Component.

The GO assignment relies on information from one or more of theSwissProt/TremB1Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

As noted above, cluster M78076 features 9 transcript(s), which werelisted in Table 654 above. These transcript(s) encode for protein(s)which are variant(s) of protein Amyloid-like protein 1 precursor (SEQ IDNO:1439). A description of each variant protein according to the presentinvention is now provided.

Variant protein M78076_PEA_(—)1_P3 (SEQ ID NO:1350) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) M78076_PEA_(—)1_T2 (SEQ IDNO:74). An alignment is given to the known protein (Amyloid-like protein1 precursor (SEQ ID NO:1439)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between M78076_PEA_(—)1_P3 (SEQ ID NO:1350) andAPP1_HUMAN (SEQ ID NO:1439):

1.An isolated chimeric polypeptide encoding for M78076_PEA_(—)1_P3 (SEQID NO:1350), comprising a first amino acid sequence being at least 90%homologous toMGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPMERWCGGSRSGSCAHPHHQVVPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAVVGKVTPTPRPTDGVDIYFGMPGEISEHEGFLRAKMDLEERRMRQINEVMREWAMADNQSKNLPKADRQALNEHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVHTHLQVIEERVNQSLGLLDQNPHLAQELRPQIQELLHSEHLGPSELEAPAPGGSSEDKGGLQPPDSKD corresponding to amino acids 1-517 ofAPP1_HUMAN (SEQ ID NO:1439), which also corresponds to amino acids 1-517of M78076_PEA_(—)1_P3 (SEQ ID NO:1350), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence GE corresponding to amino acids518-519 of M78076_PEA_(—)1_P3 (SEQ ID NO:1350), wherein said first aminoacid sequence and second amino acid sequence are contiguous and in asequential order.

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein M78076_PEA_(—)1_P3 (SEQ ID NO:1350) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 658, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein M78076_PEA_(—)1_P3 (SEQ ID NO:1350) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 658 Amino acid mutations SNP position(s) on amino acid Alternativesequence amino acid(s) Previously known SNP? 4 A -> P Yes 6 P -> H Yes13 R -> H Yes 34 Q -> No 38 G -> R Yes 88 P -> R Yes 124 R -> Q Yes 127S -> No 145 F -> S No 214 G -> R No 214 G -> No 262 Q -> No 270 V -> No309 G -> E Yes 370 Q -> No

The glycosylation sites of variant protein M78076_PEA_(—)1_P3 (SEQ IDNO:1350), as compared to the known protein Amyloid-like protein 1precursor (SEQ ID NO:1439), are described in Table 659(given accordingto their position(s) on the amino acid sequence in the first column; thesecond column indicates whether the glycosylation site is present in thevariant protein; and the last column indicates whether the position isdifferent on the variant protein).

TABLE 659 Glycosylation site(s) Position(s) on known amino acid Presentin Position in sequence variant protein? variant protein? 337 yes 337461 yes 461 551 no

Variant protein M78076_PEA_(—)1_P3 (SEQ ID NO:1350) is encoded by thefollowing transcript(s): M78076_PEA_(—)1_T2 (SEQ ID NO:74), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript M78076_PEA_(—)1_T2 (SEQ ID NO:74) is shown inbold; this coding portion starts at position 142 and ends at position1698. The transcript also has the following SNPs as listed in Table 660(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinM78076_PEA_(—)1_P3 (SEQ ID NO:1350) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 660 Nucleic acid SNPs SNP position on nucleotide AlternativePreviously sequence nucleic acid known SNP? 114 G -> No 151 G -> C Yes158 C -> A Yes 179 G -> A Yes 219 A -> G Yes 243 G -> No 253 G -> A Yes315 A -> G Yes 366 A -> G Yes 404 C -> G Yes 512 G -> A Yes 522 C -> No522 C -> T No 575 T -> C No 781 G -> No 781 G -> A No 927 G -> No 951 C-> No 1067 G -> A Yes 1077 G -> A Yes 1251 G -> No 1398 G -> T Yes 1423C -> T Yes 2146 G -> A Yes 2224 C -> T No 2362 C -> T Yes 2513 A -> G No2656 C -> T Yes

Variant protein M78076_PEA_(—)1_P4 (SEQ ID NO:1351) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) M78076_PEA_(—)1_T3 (SEQ IDNO:75). An alignment is given to the known protein (Amyloid-like protein1 precursor (SEQ ID NO:1439)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between M78076_PEA_(—)1_P4 (SEQ ID NO:1351) andAPP1_HUMAN (SEQ ID NO:1439):

1.An isolated chimeric polypeptide encoding for M78076_PEA_(—)1_P4 (SEQID NO:1351), comprising a first amino acid sequence being at least 90%homologous toMGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPMERWCGGSRSGSCAHPHHQVVPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAVVGKVTPTPRPTDGVDIYFGMPGEISEHEGFLRAKMDLEERRMRQINEVMREWAMADNQSKNLPKADRQALNEHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVHTHLQVIEERVNQSLGLLDQNPHLAQELRPQIQELLHSEHLGPSELEAPAPGGSSEDKGGLQPPDSKDDTPMTLPKG corresponding to amino acids 1-526 ofAPP1_HUMAN (SEQ ID NO:1439), which also corresponds to amino acids 1-526of M78076_PEA_(—)1_P4 (SEQ ID NO:1351), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence ECLTVNPSLQIPLNP (SEQ ID NO: 1718)corresponding to amino acids 527-541 of M78076_PEA_(—)1_P4 (SEQ IDNO:1351), wherein said first amino acid sequence and second amino acidsequence are contiguous and in a sequential order.

2.An isolated polypeptide encoding for a tail of M78076_PEA_(—)1_P4 (SEQID NO:1351), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence ECLTVNPSLQIPLNP (SEQ ID NO: 1718) in M78076_PEA_(—)1_P4 (SEQ IDNO:1351).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein M78076_PEA_(—)1_P4 (SEQ ID NO:1351) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 661, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein M78076_PEA_(—)1_P4 (SEQ ID NO:1351) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 661 Amino acid mutations SNP position(s) Previously on amino acidAlternative known sequence amino acid(s) SNP? 4 A -> P Yes 6 P -> H Yes13 R -> H Yes 34 Q -> No 38 G -> R Yes 88 P -> R Yes 124 R -> Q Yes 127S -> No 145 F -> S No 214 G -> R No 214 G -> No 262 Q -> No 270 V -> No309 G -> E Yes 370 Q -> No

The glycosylation sites of variant protein M78076_PEA_(—)1_P4 (SEQ IDNO:1351), as compared to the known protein Amyloid-like protein 1precursor (SEQ ID NO:1439), are described in Table 662 (given accordingto their position(s) on the amino acid sequence in the first column; thesecond column indicates whether the glycosylation site is present in thevariant protein; and the last column indicates whether the position isdifferent on the variant protein).

TABLE 662 Glycosylation site(s) Position(s) Position on known in aminoacid Present in variant sequence variant protein? protein? 337 yes 337461 yes 461 551 no

Variant protein M78076_PEA_(—)1_P4 (SEQ ID NO:1351) is encoded by thefollowing transcript(s): M78076_PEA_(—)1_T3 (SEQ ID NO:75), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript M78076_PEA_(—)1_T3 (SEQ ID NO:75) is shown inbold; this coding portion starts at position 142 and ends at position1764. The transcript also has the following SNPs as listed in Table 663(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinM78076_PEA_(—)1_P4 (SEQ ID NO:1351) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 663 Nucleic acid SNPs SNP position on nucleotide AlternativePreviously sequence nucleic acid known SNP? 114 G -> No 151 G -> C Yes158 C -> A Yes 179 G -> A Yes 219 A -> G Yes 243 G -> No 253 G -> A Yes315 A -> G Yes 366 A -> G Yes 404 C -> G Yes 512 G -> A Yes 522 C -> No522 C -> T No 575 T -> C No 781 G -> No 781 G -> A No 927 G -> No 951 C-> No 1067 G -> A Yes 1077 G -> A Yes 1251 G -> No 1398 G -> T Yes 1423C -> T Yes 1817 G -> A Yes 2362 G -> A Yes 2440 C -> T No 2578 C -> TYes 2729 A -> G No 2872 C -> T Yes

Variant protein M78076_PEA_(—)1_P12 (SEQ ID NO:1352) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) M78076_PEA_(—)1_T13 (SEQ IDNO:77). An alignment is given to the known protein (Amyloid-like protein1 precursor (SEQ ID NO:1439)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between M78076_PEA_(—)1_P12 (SEQ ID NO:1352) andAPP1_HUMAN (SEQ ID NO:1439):

1.An isolated chimeric polypeptide encoding for M78076_PEA_(—)1_P12 (SEQID NO:1352), comprising a first amino acid sequence being at least 90%homologous toMGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPMERWCGGSRSGSCAHPHHQVVPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAVVGKVTPTPRPTDGVDIYFGMPGEISEHEGFLRAKMDLEERRMRQINEVMREWAMADNQSKNLPKADRQALNEHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVHTHLQVIEERVNQSLGLLDQNPHLAQELRPQIQELLHSEHLGPSELEAPAPGGSSEDKGGLQPPDSKDDTPMTLPKG corresponding to amino acids 1-526 ofAPP1_HUMAN (SEQ ID NO:1439), which also corresponds to amino acids 1-526of M78076_PEA_(—)1_P12 (SEQ ID NO:1352), and a second amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence ECVCSKGFPFPLIGDSEG (SEQID NO: 1719) corresponding to amino acids 527-544 of M78076_PEA_(—)1_P12(SEQ ID NO:1352), wherein said first amino acid sequence and secondamino acid sequence are contiguous and in a sequential order.

2.An isolated polypeptide encoding for a tail of M78076_PEA_(—)1_P12(SEQ ID NO:1352), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence ECVCSKGFPFPLIGDSEG (SEQ ID NO: 1719) inM78076_PEA_(—)1_P12 (SEQ ID NO:1352).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein M78076_PEA_(—)1_P12 (SEQ ID NO:1352) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 664, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein M78076_PEA_(—)1_P12 (SEQ ID NO:1352) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 664 Amino acid mutations SNP position(s) Previously on amino acidAlternative known sequence amino acid(s) SNP? 4 A -> P Yes 6 P -> H Yes13 R -> H Yes 34 Q -> No 38 G -> R Yes 88 P -> R Yes 124 R -> Q Yes 127S -> No 145 F -> S No 214 G -> R No 214 G -> No 262 Q -> No 270 V -> No309 G -> E Yes 370 Q -> No

The glycosylation sites of variant protein M78076_PEA_(—)1_P12 (SEQ IDNO:1352), as compared to the known protein Amyloid-like protein 1precursor (SEQ ID NO:1439), are described in Table 665 (given accordingto their position(s) on the amino acid sequence in the first column; thesecond column indicates whether the glycosylation site is present in thevariant protein; and the last column indicates whether the position isdifferent on the variant protein).

TABLE 665 Glycosylation site(s) Position(s) on known amino acid Presentin Position in sequence variant protein? variant protein? 337 yes 337461 yes 461 551 no

Variant protein M78076_PEA_(—)1_P12 (SEQ ID NO:1352) is encoded by thefollowing transcript(s): M78076_PEA_(—)1_T13 (SEQ ID NO:77), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript M78076_PEA_(—)1_T13 (SEQ ID NO:77) is shown inbold; this coding portion starts at position 142 and ends at position1773. The transcript also has the following SNPs as listed in Table 666(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinM78076_PEA_(—)1_P12 (SEQ ID NO:1352) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 666 Nucleic acid SNPs SNP position on nucleotide Previouslysequence Alternative nucleic acid known SNP? 114 G -> No 151 G -> C Yes158 C -> A Yes 179 G -> A Yes 219 A -> G Yes 243 G -> No 253 G -> A Yes315 A -> G Yes 366 A -> G Yes 404 C -> G Yes 512 G -> A Yes 522 C -> No522 C -> T No 575 T -> C No 781 G -> No 781 G -> A No 927 G -> No 951 C-> No 1067 G -> A Yes 1077 G -> A Yes 1251 G -> No 1398 G -> T Yes 1423C -> T Yes 1816 G -> A Yes 1894 C -> T No 2032 C -> T Yes 2183 A -> G No2326 C -> T Yes

Variant protein M78076_PEA_(—)1_P14 (SEQ ID NO:1353) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) M78076_PEA_(—)1_T15 (SEQ IDNO:78). An alignment is given to the known protein (Amyloid-like protein1 precursor (SEQ ID NO:1439)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between M78076_PEA_(—)1_P14 (SEQ ID NO:1353) andAPP1_HUMAN (SEQ ID NO:1439):

1.An isolated chimeric polypeptide encoding for M78076_PEA_(—)1_P14 (SEQID NO:1353), comprising a first amino acid sequence being at least 90%homologous toMGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPMERWCGGSRSGSCAHPHHQVVPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAVVGKVTPTPRPTDGVDIYFGMPGEISEHEGFLRAKMDLEERRMRQINEVMREWAMADNQSKNLPKADRQALNEHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVHTHLQVIEERVNQSLGLLDQNPHLAQELRPQIQELLHSEHLGPSELEAPAPGGSSEDKGGLQPPDSKDDTPMTLPKGSTEQDAASPEKEKMNPLEQYERKVNASVPRGFPFHSSEIQRDEL corresponding to amino acids 1-570 of APP1_HUMAN (SEQ IDNO:1439), which also corresponds to amino acids 1-570 ofM78076_PEA_(—)1_P14 (SEQ ID NO:1353), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequenceVRGGTAGYLGEETRGQRPGCDSQSHTGPSKKPSAPSPLPAGTSWDRGVP (SEQ ID NO: 1720)corresponding to amino acids 571-619 of M78076_PEA_(—)1_P14 (SEQ IDNO:1353), wherein said first amino acid sequence and second amino acidsequence are contiguous and in a sequential order.

2.An isolated polypeptide encoding for a tail of M78076_PEA_(—)1_P14(SEQ ID NO:1353), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequenceVRGGTAGYLGEETRGQRPGCDSQSHTGPSKKPSAPSPLPAGTSWDRGVP (SEQ ID NO: 1720) inM78076_PEA_(—)1_P14 (SEQ ID NO:1353).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein M78076_PEA_(—)1_P14 (SEQ ID NO:1353) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 667, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein M78076_PEA_(—)1_P14 (SEQ ID NO:1353) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 667 Amino acid mutations SNP position(s) Previously on amino acidAlternative known sequence amino acid(s) SNP? 4 A -> P Yes 6 P -> H Yes13 R -> H Yes 34 Q -> No 38 G -> R Yes 88 P -> R Yes 124 R -> Q Yes 127S -> No 145 F -> S No 214 G -> R No 214 G -> No 262 Q -> No 270 V -> No309 G -> E Yes 370 Q -> No

The glycosylation sites of variant protein M78076_PEA_(—)1_P14 (SEQ IDNO:1353), as compared to the known protein Amyloid-like protein 1precursor (SEQ ID NO:1439), are described in Table 668 (given accordingto their position(s) on the amino acid sequence in the first column; thesecond column indicates whether the glycosylation site is present in thevariant protein; and the last column indicates whether the position isdifferent on the variant protein).

TABLE 668 Glycosylation site(s) Position(s) on known Present Positionamino acid in variant in variant sequence protein? protein? 337 yes 337461 yes 461 551 yes 551

Variant protein M78076_PEA_(—)1_P14 (SEQ ID NO:1353) is encoded by thefollowing transcript(s): M78076_PEA_(—)1_T15 (SEQ ID NO:78), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript M78076_PEA_(—)1_T15 (SEQ ID NO:78) is shown inbold; this coding portion starts at position 142 and ends at position1998. The transcript also has the following SNPs as listed in Table 669(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinM78076_PEA_(—)1_P14 (SEQ ID NO:1353) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 669 Nucleic acid SNPs SNP position Alternative Previously onnucleotide nucleic known sequence acid SNP? 114 G -> No 151 G -> C Yes158 C -> A Yes 179 G -> A Yes 219 A -> G Yes 243 G -> No 253 G -> A Yes315 A -> G Yes 366 A -> G Yes 404 C -> G Yes 512 G -> A Yes 522 C -> No522 C -> T No 575 T -> C No 781 G -> No 781 G -> A No 927 G -> No 951 C-> No 1067 G -> A Yes 1077 G -> A Yes 1251 G -> No 1398 G -> T Yes 1423C -> T Yes 2008 G -> A Yes 2086 C -> T No 2224 C -> T Yes 2375 A -> G No2518 C -> T Yes

Variant protein M78076_PEA_(—)1_P21 (SEQ ID NO:1354) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) M78076_PEA_(—)1_T23 (SEQ IDNO:79). An alignment is given to the known protein (Amyloid-like protein1 precursor (SEQ ID NO:1439)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between M78076_PEA_(—)1_P21 (SEQ ID NO:1354) andAPP1_HUMAN (SEQ ID NO:1439):

1.An isolated chimeric polypeptide encoding for M78076_PEA_(—)1_P21 (SEQID NO:1354), comprising a first amino acid sequence being at least 90%homologous toMGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPMERWCGGSRSGSCAHPHHQVVPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAVVGKVTPTPRPTDGVDIYFGMPGEISEHEGFLRAKMDLEERRMRQINEVMREWAMADNQSKNLPKADRQALNEcorresponding to amino acids 1-352 of APP1_HUMAN (SEQ ID NO:1439), whichalso corresponds to amino acids 1-352 of M78076_PEA_(—)1_P21 (SEQ IDNO:1354), and a second amino acid sequence being at least 90% homologousto AERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVHTHLQVIEERVNQSLGLLDQNPHLAQELRPQIQELLHSEHLGPSELEAPAPGGSSEDKGGLQPPDSKDDTPMTLPKGSTEQDAASPEKEKMNPLEQYERKVNASVPRGFPFHSSEIQRDELAPAGTGVSREAVSGLLIMGAGGGSLIVLSMLLLRRKKPYGAISHGVVEVDPMLTLEEQQLRELQRHGYENPTYRFLEERP corresponding to amino acids 406-650of APP1_HUMAN (SEQ ID NO:1439), which also corresponds to amino acids353-597 of M78076_PEA_(—)1_P21 (SEQ ID NO:1354), wherein said firstamino acid sequence and second amino acid sequence are contiguous and ina sequential order.

2.An isolated chimeric polypeptide encoding for an edge portion ofM78076_PEA_(—)1_P21 (SEQ ID NO:1354), comprising a polypeptide having alength “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise EA, having a structureas follows: a sequence starting from any of amino acid numbers 352−x to352; and ending at any of amino acid numbers 353+((n−2)−x), in which xvaries from 0 to n−2.

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:membrane. The protein localization is believed to be membrane becausealthough both signal-peptide prediction programs agree that this proteinhas a signal peptide, both trans-membrane region prediction programspredict that this protein has a trans-membrane region downstream of thissignal peptide.

Variant protein M78076_PEA_(—)1_P21 (SEQ ID NO:1354) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 670, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein M78076_PEA_(—)1_P21 (SEQ ID NO:1354) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 670 Amino acid mutations SNP position(s) Alternative Previously onamino acid amino known sequence acid(s) SNP? 4 A -> P Yes 6 P -> H Yes13 R -> H Yes 34 Q -> No 38 G -> R Yes 88 P -> R Yes 124 R -> Q Yes 127S -> No 145 F -> S No 214 G -> R No 214 G -> No 262 Q -> No 270 V -> No309 G -> E Yes

The glycosylation sites of variant protein M78076_PEA_(—)1_P21 (SEQ IDNO:1354), as compared to the known protein Amyloid-like protein 1precursor (SEQ ID NO:1439), are described in Table 671 (given accordingto their position(s) on the amino acid sequence in the first column; thesecond column indicates whether the glycosylation site is present in thevariant protein; and the last column indicates whether the position isdifferent on the variant protein).

TABLE 671 Glycosylation site(s) Position(s) on known Present Positionamino acid in variant in variant sequence protein? protein? 337 yes 337461 yes 408 551 yes 498

Variant protein M78076_PEA_(—)1_P21 (SEQ ID NO:1354) is encoded by thefollowing transcript(s): M78076_PEA_(—)1_T23 (SEQ ID NO:79), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript M78076_PEA_(—)1_T23 (SEQ ID NO:79) is shown inbold; this coding portion starts at position 142 and ends at position1932. The transcript also has the following SNPs as listed in Table 672(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinM78076_PEA_(—)1_P21 (SEQ ID NO:1354) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 672 Nucleic acid SNPs SNP position Alternative Previously onnucleotide nucleic known sequence acid SNP? 114 G -> No 151 G -> C Yes158 C -> A Yes 179 G -> A Yes 219 A -> G Yes 243 G -> No 253 G -> A Yes315 A -> G Yes 366 A -> G Yes 404 C -> G Yes 512 G -> A Yes 522 C -> No522 C -> T No 575 T -> C No 781 G -> No 781 G -> A No 927 G -> No 951 C-> No 1067 G -> A Yes 1077 G -> A Yes 1239 G -> T Yes 1264 C -> T Yes1728 G -> A Yes 1806 C -> T No 1944 C -> T Yes 2095 A -> G No 2238 C ->T Yes

Variant protein M78076_PEA_(—)1_P24 (SEQ ID NO:1355) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) M78076_PEA_(—)1_T26 (SEQ IDNO:80). An alignment is given to the known protein (Amyloid-like protein1 precursor (SEQ ID NO:1439)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between M78076_PEA_(—)1_P24 (SEQ ID NO:1355) andAPP1_HUMAN (SEQ ID NO:1439):

1.An isolated chimeric polypeptide encoding for M78076_PEA_(—)1_P24 (SEQID NO:1355), comprising a first amino acid sequence being at least 90%homologous toMGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPMERWCGGSRSGSCAHPHHQVVPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAVVGKVTPTPRPTDGVDIYFGMPGEISEHEGFLRAKMDLEERRMRQINEVMREWAMADNQSKNLPKADRQALNEHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQVHTHLQVIEERVNQSLGLLDQNPHLAQELRPQI correspondingto amino acids 1-481 of APP1_HUMAN (SEQ ID NO:1439), which alsocorresponds to amino acids 1-481 of M78076_PEA_(—)1_P24 (SEQ IDNO:1355), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence RECLLPWLPLQISEGRS (SEQ ID NO: 1721) corresponding toamino acids 482-498 of M78076_PEA_(—)1_P24 (SEQ ID NO:1355), whereinsaid first amino acid sequence and second amino acid sequence arecontiguous and in a sequential order.

2.An isolated polypeptide encoding for a tail of M78076_PEA_(—)1_P24(SEQ ID NO:1355), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence RECLLPWLPLQISEGRS (SEQ ID NO: 1721) inM78076_PEA_(—)1_P24 (SEQ ID NO:1355).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein M78076_PEA_(—)1_P24 (SEQ ID NO:1355) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 673, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein M78076_PEA_(—)1_P24 (SEQ ID NO:1355) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 673 Amino acid mutations SNP position(s) Alternative Previously onamino acid amino known sequence acid(s) SNP? 4 A -> P Yes 6 P -> H Yes13 R -> H Yes 34 Q -> No 38 G -> R Yes 88 P -> R Yes 124 R -> Q Yes 127S -> No 145 F -> S No 214 G -> R No 214 G -> No 262 Q -> No 270 V -> No309 G -> E Yes 370 Q -> No

The glycosylation sites of variant protein M78076_PEA_(—)1_P24 (SEQ IDNO:1355), as compared to the known protein Amyloid-like protein 1precursor (SEQ ID NO:1439), are described in Table 674 (given accordingto their position(s) on the amino acid sequence in the first column; thesecond column indicates whether the glycosylation site is present in thevariant protein; and the last column indicates whether the position isdifferent on the variant protein).

TABLE 674 Glycosylation site(s) Position(s) on known Present Positionamino acid in variant in variant sequence protein? protein? 337 yes 337461 yes 461 551 no

Variant protein M78076_PEA_(—)1_P24 (SEQ ID NO:1355) is encoded by thefollowing transcript(s): M78076_PEA_(—)1_T26 (SEQ ID NO:80), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript M78076_PEA_(—)1_T26 (SEQ ID NO:80) is shown inbold; this coding portion starts at position 142 and ends at position1635. The transcript also has the following SNPs as listed in Table 675(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinM78076_PEA_(—)1_P24 (SEQ ID NO:1355) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 675 Nucleic acid SNPs SNP position Alternative Previously onnucleotide nucleic known sequence acid SNP? 114 G -> No 151 G -> C Yes158 C -> A Yes 179 G -> A Yes 219 A -> G Yes 243 G -> No 253 G -> A Yes315 A -> G Yes 366 A -> G Yes 404 C -> G Yes 512 G -> A Yes 522 C -> No522 C -> T No 575 T -> C No 781 G -> No 781 G -> A No 927 G -> No 951 C-> No 1067 G -> A Yes 1077 G -> A Yes 1251 G -> No 1398 G -> T Yes 1423C -> T Yes 2184 G -> A Yes

Variant protein M78076_PEA_(—)1_P2 (SEQ ID NO:1356) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) M78076_PEA_(—)1_T27 (SEQ IDNO:81). An alignment is given to the known protein (Amyloid-like protein1 precursor (SEQ ID NO:1439)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between M78076_PEA_(—)1_P2 (SEQ ID NO:1356) andAPP1_HUMAN (SEQ ID NO:1439):

1.An isolated chimeric polypeptide encoding for M78076_PEA_(—)1_P2 (SEQID NO:1356), comprising a first amino acid sequence being at least 90%homologous toMGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPMERWCGGSRSGSCAHPHHQVVPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAVVGKVTPTPRPTDGVDIYFGMPGEISEHEGFLRAKMDLEERRMRQINEVMREWAMADNQSKNLPKADRQALNEHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQV corresponding to amino acids 1-449 ofAPP1_HUMAN (SEQ ID NO:1439), which also corresponds to amino acids 1-449of M78076_PEA_(—)1_P2 (SEQ ID NO:1356), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequenceLTSFQLPNAPLFLRRPRLRLFSCPLDPLSVSWTPSYPLNTASLPLPSLSAQLPDPETWTLTCCVFDPCFLALGFLLPPPSILCSVPWIFTAFPRIVFFFFFFLRQVLALSPRQESSVRSWLIATSTSWVQAILLPQPLE (SEQID NO:1722) corresponding to amino acids 450-588 of M78076_PEA_(—)1_P2(SEQ ID NO:1356), wherein said first amino acid sequence and secondamino acid sequence are contiguous and in a sequential order.

2.An isolated polypeptide encoding for a tail of M78076_PEA_(—)1_P2 (SEQID NO:1356), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequenceLTSFQLPNAPLFLRRPRLRLFSCPLDPLSVSWTPSYPLNTASLPLPSLSAQLPDPETWTLTCCVFDPCFLALGFLLPPPSILCSVPWIFTAFPRIVFFFFFFLRQVLALSPRQESSVRSWLIATSTSWVQAILLPQPLE (SEQID NO: 1722) in M78076_PEA_(—)1_P2 (SEQ ID NO:1356).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:membrane. The protein localization is believed to be membrane becausealthough both signal-peptide prediction programs agree that this proteinhas a signal peptide, both trans-membrane region prediction programspredict that this protein has a trans-membrane region downstream of thissignal peptide.

Variant protein M78076_PEA_(—)1_P2 (SEQ ID NO:1356) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 676, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein M78076_PEA_(—)1_P2 (SEQ ID NO:1356) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 676 Amino acid mutations SNP position(s) Alternative Previously onamino acid amino known sequence acid(s) SNP? 4 A -> P Yes 6 P -> H Yes13 R -> H Yes 34 Q -> No 38 G -> R Yes 88 P -> R Yes 124 R -> Q Yes 127S -> No 145 F -> S No 214 G -> R No 214 G -> No 262 Q -> No 270 V -> No309 G -> E Yes 370 Q -> No 520 A -> S Yes 546 F -> Yes 564 S -> C Yes

The glycosylation sites of variant protein M78076_PEA_(—)1_P2 (SEQ IDNO:1356), as compared to the known protein Amyloid-like protein 1precursor (SEQ ID NO:1439), are described in Table 677 (given accordingto their position(s) on the amino acid sequence in the first column; thesecond column indicates whether the glycosylation site is present in thevariant protein; and the last column indicates whether the position isdifferent on the variant protein).

TABLE 677 Glycosylation site(s) Position(s) on known Present Positionamino acid in variant in variant sequence protein? protein? 337 yes 337461 no 551 no

Variant protein M78076_PEA_(—)1_P2 (SEQ ID NO:1356) is encoded by thefollowing transcript(s): M78076_PEA_(—)1_T27 (SEQ ID NO:81), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript M78076_PEA_(—)1_T27 (SEQ ID NO:81) is shown inbold; this coding portion starts at position 142 and ends at position1905. The transcript also has the following SNPs as listed in Table 678(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinM78076_PEA_(—)1_P2 (SEQ ID NO:1356) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 678 Nucleic acid SNPs SNP position Alternative Previously onnucleotide nucleic known sequence acid SNP? 114 G -> No 151 G -> C Yes158 C -> A Yes 179 G -> A Yes 219 A -> G Yes 243 G -> No 253 G -> A Yes315 A -> G Yes 366 A -> G Yes 404 C -> G Yes 512 G -> A Yes 522 C -> No522 C -> T No 575 T -> C No 781 G -> No 781 G -> A No 927 G -> No 951 C-> No 1067 G -> A Yes 1077 G -> A Yes 1251 G -> No 1398 G -> T Yes 1423C -> T Yes 1500 C -> T Yes 1699 G -> T Yes 1725 G -> A Yes 1777 T -> Yes1831 A -> T Yes 2274 A -> G Yes 2525 A -> G Yes 2681 G -> A Yes 3831 G-> A Yes

Variant protein M78076_PEA_(—)1_P25 (SEQ ID NO:1357) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) M78076_PEA_(—)1_T28 (SEQ IDNO:82). An alignment is given to the known protein (Amyloid-like protein1 precursor (SEQ ID NO:1439)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between M78076_PEA_(—)1_P25 (SEQ ID NO:1357) andAPP1_HUMAN (SEQ ID NO:1439):

1.An isolated chimeric polypeptide encoding for M78076_PEA_(—)1_P25 (SEQID NO:1357), comprising a first amino acid sequence being at least 90%homologous toMGPASPAARGLSRRPGQPPLPLLLPLLLLLLRAQPAIGSLAGGSPGAAEAPGSAQVAGLCGRLTLHRDLRTGRWEPDPQRSRRCLRDPQRVLEYCRQMYPELQIARVEQATQAIPMERWCGGSRSGSCAHPHHQVVPFRCLPGEFVSEALLVPEGCRFLHQERMDQCESSTRRHQEAQEACSSQGLILHGSGMLLPCGSDRFRGVEYVCCPPPGTPDPSGTAVGDPSTRSWPPGSRVEGAEDEEEEESFPQPVDDYFVEPPQAEEEEETVPPPSSHTLAVVGKVTPTPRPTDGVDIYFGMPGEISEHEGFLRAKMDLEERRMRQINEVMREWAMADNQSKNLPKADRQALNEHFQSILQTLEEQVSGERQRLVETHATRVIALINDQRRAALEGFLAALQADPPQAERVLLALRRYLRAEQKEQRHTLRHYQHVAAVDPEKAQQMRFQ corresponding to amino acids 1-448 ofAPP1_HUMAN (SEQ ID NO:1439), which also corresponds to amino acids 1-448of M78076_PEA_(—)1_P25 (SEQ ID NO:1357), and a second amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequencePQNPNSQPRAAGSLEVIISHPFVRRLEILISPFQFQNSIPKNSQIVPAASPRGTSSP (SEQ ID NO:1723) corresponding to amino acids 449-505 of M78076_PEA_(—)1_P25 (SEQID NO:1357), wherein said first amino acid sequence and second aminoacid sequence are contiguous and in a sequential order.

2.An isolated polypeptide encoding for a tail of M78076_PEA_(—)1_P25(SEQ ID NO:1357), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequencePQNPNSQPRAAGSLEVIISHPFVRRLEILISPFQFQNSIPKNSQIVPAASPRGTSSP (SEQ ID NO:1723) in M78076_PEA_(—)1_P25 (SEQ ID NO:1357).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein M78076_PEA_(—)1_P25 (SEQ ID NO:1357) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 679, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein M78076_PEA_(—)1_P25 (SEQ ID NO:1357) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 679 Amino acid mutations SNP position(s) Alternative Previously onamino acid amino known sequence acid(s) SNP? 4 A -> P Yes 6 P -> H Yes13 R -> H Yes 34 Q -> No 38 G -> R Yes 88 P -> R Yes 124 R -> Q Yes 127S -> No 145 F -> S No 214 G -> R No 214 G -> No 262 Q -> No 270 V -> No309 G -> E Yes 370 Q -> No

The glycosylation sites of variant protein M78076_PEA_(—)1_P25 (SEQ IDNO:1357), as compared to the known protein Amyloid-like protein 1precursor (SEQ ID NO:1439), are described in Table 680 (given accordingto their position(s) on the amino acid sequence in the first column; thesecond column indicates whether the glycosylation site is present in thevariant protein; and the last column indicates whether the position isdifferent on the variant protein).

TABLE 680 Glycosylation site(s) Position(s) on known Present Positionamino acid in variant in variant sequence protein? protein? 337 yes 337461 no 551 no

Variant protein M78076_PEA_(—)1_P25 (SEQ ID NO:1357) is encoded by thefollowing transcript(s): M78076_PEA_(—)1_T28 (SEQ ID NO:82), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript M78076_PEA_(—)1_T28 (SEQ ID NO:82) is shown inbold; this coding portion starts at position 142 and ends at position1656. The transcript also has the following SNPs as listed in Table 681(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinM78076_PEA_(—)1_P25 (SEQ ID NO:1357) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 681 Nucleic acid SNPs SNP position Alternative Previously onnucleotide nucleic known sequence acid SNP? 114 G -> No 151 G -> C Yes158 C -> A Yes 179 G -> A Yes 219 A -> G Yes 243 G -> No 253 G -> A Yes315 A -> G Yes 366 A -> G Yes 404 C -> G Yes 512 G -> A Yes 522 C -> No522 C -> T No 575 T -> C No 781 G -> No 781 G -> A No 927 G -> No 951 C-> No 1067 G -> A Yes 1077 G -> A Yes 1251 G -> No 1398 G -> T Yes 1423C -> T Yes 1593 A -> G No 1736 C -> T Yes

As noted above, cluster M78076 features 35 segment(s), which were listedin Table 655 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster M78076_PEA_(—)1_node_(—)0 (SEQ ID NO:659) according tothe present invention is supported by 47 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T2 (SEQ ID NO:74),M78076_PEA_(—)1_T3 (SEQ ID NO:75), M78076_PEA_(—)1_T5 (SEQ ID NO:76),M78076_PEA_(—)1_T13 (SEQ ID NO:77), M78076_PEA_(—)1_T15 (SEQ ID NO:78),M78076_PEA_(—)1_T23 (SEQ ID NO:79), M78076_PEA_(—)1_T26 (SEQ ID NO:80),M78076_PEA_(—)1_T27 (SEQ ID NO:81) and M78076_PEA_(—)1_T28 (SEQ IDNO:82). Table 682 below describes the starting and ending position ofthis segment on each transcript.

TABLE 682 Segment location on transcripts Segment Segment startingending Transcript name position position M78076_PEA_1_T2 (SEQ ID NO: 74)1 160 M78076_PEA_1_T3 (SEQ ID NO: 75) 1 160 M78076_PEA_1_T5 (SEQ ID NO:76) 1 160 M78076_PEA_1_T13 (SEQ ID NO: 77) 1 160 M78076_PEA_1_T15 (SEQID NO: 78) 1 160 M78076_PEA_1_T23 (SEQ ID NO: 79) 1 160 M78076_PEA_1_T26(SEQ ID NO: 80) 1 160 M78076_PEA_1_T27 (SEQ ID NO: 81) 1 160M78076_PEA_1_T28 (SEQ ID NO: 82) 1 160

Segment cluster M78076_PEA_(—)1_node_(—)10 (SEQ ID NO:660) according tothe present invention is supported by 70 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T2 (SEQ ID NO:74),M78076_PEA_(—)1_T3 (SEQ ID NO:75), M78076_PEA_(—)1_T5 (SEQ ID NO:76),M78076_PEA_(—)1_T13 (SEQ ID NO:77), M78076_PEA_(—)1_T15 (SEQ ID NO:78),M78076_PEA_(—)1_T23 (SEQ ID NO:79), M78076_PEA_(—)1_T26 (SEQ ID NO:80),M78076_PEA_(—)1_T27 (SEQ ID NO:81) and M78076_PEA_(—)1_T28 (SEQ IDNO:82). Table 683 below describes the starting and ending position ofthis segment on each transcript.

TABLE 683 Segment location on transcripts Segment Segment startingending Transcript name position position M78076_PEA_1_T2 (SEQ ID NO: 74)433 565 M78076_PEA_1_T3 (SEQ ID NO: 75) 433 565 M78076_PEA_1_T5 (SEQ IDNO: 76) 433 565 M78076_PEA_1_T13 (SEQ ID NO: 77) 433 565M78076_PEA_1_T15 (SEQ ID NO: 78) 433 565 M78076_PEA_1_T23 (SEQ ID NO:79) 433 565 M78076_PEA_1_T26 (SEQ ID NO: 80) 433 565 M78076_PEA_1_T27(SEQ ID NO: 81) 433 565 M78076_PEA_1_T28 (SEQ ID NO: 82) 433 565

Segment cluster M78076_PEA_(—)1_node_(—)15 (SEQ ID NO:661) according tothe present invention is supported by 74 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T2 (SEQ ID NO:74),M78076_PEA_(—)1_T15 (SEQ ID NO:75), M78076_PEA_(—)1_T5 (SEQ ID NO:76),M78076_PEA_(—)1_T13 (SEQ ID NO:77), M78076_PEA_(—)1_T15 (SEQ ID NO:78),M78076_PEA_(—)1_T23 (SEQ ID NO:79), M78076_PEA_(—)1_T26 (SEQ ID NO:80),M78076_PEA_(—)1_T27 (SEQ ID NO:81) and M78076_PEA_(—)1_T28 (SEQ IDNO:82). Table 684 below describes the starting and ending position ofthis segment on each transcript.

TABLE 684 Segment location on transcripts Segment Segment startingending Transcript name position position M78076_PEA_1_T2 (SEQ ID NO: 74)679 812 M78076_PEA_1_T3 (SEQ ID NO: 75) 679 812 M78076_PEA_1_T5 (SEQ IDNO: 76) 679 812 M78076_PEA_1_T13 (SEQ ID NO: 77) 679 812M78076_PEA_1_T15 (SEQ ID NO: 78) 679 812 M78076_PEA_1_T23 (SEQ ID NO:79) 679 812 M78076_PEA_1_T26 (SEQ ID NO: 80) 679 812 M78076_PEA_1_T27(SEQ ID NO: 81) 679 812 M78076_PEA_1_T28 (SEQ ID NO: 82) 679 812

Segment cluster M78076_PEA_(—)1_node_(—)18 (SEQ ID NO:662) according tome present invention is supported by 95 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T2 (SEQ ID NO:74),M78076_PEA_(—)1_T3 (SEQ ID NO:75), M78076_PEA_(—)1_T5 (SEQ ID NO:76),M78076_PEA_(—)1_T13 (SEQ ID NO:77), M78076_PEA_(—)1_T15 (SEQ ID NO:78),M78076_PEA_(—)1_T23 (SEQ ID NO:79), M78076_PEA_(—)1_T26 (SEQ ID NO:80),M78076_PEA_(—)1_T27 (SEQ ID NO:81) and M78076_PEA_(—)1_T28 (SEQ IDNO:82). Table 685 below describes the starting and ending position ofthis segment on each transcript.

TABLE 685 Segment location on transcripts Segment Segment startingending Transcript name position position M78076_PEA_1_T2 (SEQ ID NO: 74)813 991 M78076_PEA_1_T3 (SEQ ID NO: 75) 813 991 M78076_PEA_1_T5 (SEQ IDNO: 76) 813 991 M78076_PEA_1_T13 (SEQ ID NO: 77) 813 991M78076_PEA_1_T15 (SEQ ID NO: 78) 813 991 M78076_PEA_1_T23 (SEQ ID NO:79) 813 991 M78076_PEA_1_T26 (SEQ ID NO: 80) 813 991 M78076_PEA_1_T27(SEQ ID NO: 81) 813 991 M78076_PEA_1_T28 (SEQ ID NO: 82) 813 991

Segment cluster M78076_PEA_(—)1_node_(—)20 (SEQ ID NO:663) according tothe present invention is supported by 99 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T2 (SEQ ID NO:74),M78076_PEA_(—)1_T3 (SEQ ID NO:75), M78076_PEA_(—)1_T5 (SEQ ID NO:76),M78076_PEA_(—)1_T13 (SEQ ID NO:77), M78076_PEA_(—)1_T15 (SEQ ID NO:78),M78076_PEA_(—)1_T23 (SEQ ID NO:79), M78076_PEA_(—)1_T26 (SEQ ID NO:80),M78076_PEA_(—)1_T27 (SEQ ID NO:81) and M78076_PEA_(—)1_T28 (SEQ IDNO:82). Table 686 below describes the starting and ending position ofthis segment on each transcript.

TABLE 686 Segment location on transcripts Segment Segment startingending Transcript name position position M78076_PEA_1_T2 (SEQ ID NO: 74)992 1122 M78076_PEA_1_T3 (SEQ ID NO: 75) 992 1122 M78076_PEA_1_T5 (SEQID NO: 76) 992 1122 M78076_PEA_1_T13 (SEQ ID NO: 77) 992 1122M78076_PEA_1_T15 (SEQ ID NO: 78) 992 1122 M78076_PEA_1_T23 (SEQ ID NO:79) 992 1122 M78076_PEA_1_T26 (SEQ ID NO: 80) 992 1122 M78076_PEA_1_T27(SEQ ID NO: 81) 992 1122 M78076_PEA_1_T28 (SEQ ID NO: 82) 992 1122

Segment cluster M78076_PEA_(—)1_node_(—)24 (SEQ ID NO:664) according tothe present invention is supported by 105 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T2 (SEQ ID NO:74),M78076_PEA_(—)1_T3 (SEQ ID NO:75), M78076_PEA_(—)1_T5 (SEQ ID NO:76),M78076_PEA_(—)1_T13 (SEQ ID NO:77), M78076_PEA_(—)1_T15 (SEQ ID NO:78),M78076_PEA_(—)1_T26 (SEQ ID NO:80), M78076_PEA_(—)1_T27 (SEQ ID NO:81)and M78076_PEA_(—)1_T28 (SEQ ID NO:82). Table 687 below describes thestarting and ending position of this segment on each transcript.

TABLE 687 Segment location on transcripts Segment Segment startingending Transcript name position position M78076_PEA_1_T2 (SEQ ID NO: 74)1198 1356 M78076_PEA_1_T3 (SEQ ID NO: 75) 1198 1356 M78076_PEA_1_T5 (SEQID NO: 76) 1198 1356 M78076_PEA_1_T13 (SEQ ID NO: 77) 1198 1356M78076_PEA_1_T15 (SEQ ID NO: 78) 1198 1356 M78076_PEA_1_T26 (SEQ ID NO:80) 1198 1356 M78076_PEA_1_T27 (SEQ ID NO: 81) 1198 1356M78076_PEA_1_T28 (SEQ ID NO: 82) 1198 1356

Segment cluster M78076_PEA_(—)1_node_(—)26 (SEQ ID NO:665) according tothe present invention is supported by 99 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T2 (SEQ ID NO:74),M78076_PEA_(—)1_T3 (SEQ ID NO:75), M78076_PEA_(—)1_T5 (SEQ ID NO:76),M78076_PEA_(—)1_T13 (SEQ ID NO:77), M78076_PEA_(—)1_T15 (SEQ ID NO:78),M78076_PEA_(—)1_T23 (SEQ ID NO:79), M78076_PEA_(—)1_T26 (SEQ ID NO:80),M78076_PEA_(—)1_T27 (SEQ ID NO:81) and M78076_PEA_(—)1_T28 (SEQ IDNO:82). Table 688 below describes the starting and ending position ofthis segment on each transcript.

TABLE 688 Segment location on transcripts Segment Segment startingending Transcript name position position M78076_PEA_1_T2 (SEQ ID NO: 74)1357 1485 M78076_PEA_1_T3 (SEQ ID NO: 75) 1357 1485 M78076_PEA_1_T5 (SEQID NO: 76) 1357 1485 M78076_PEA_1_T13 (SEQ ID NO: 77) 1357 1485M78076_PEA_1_T15 (SEQ ID NO: 78) 1357 1485 M78076_PEA_1_T23 (SEQ ID NO:79) 1198 1326 M78076_PEA_1_T26 (SEQ ID NO: 80) 1357 1485M78076_PEA_1_T27 (SEQ ID NO: 81) 1357 1485 M78076_PEA_1_T28 (SEQ ID NO:82) 1357 1485

Segment cluster M78076_PEA_(—)1_node_(—)29 (SEQ ID NO:666) according tothe present invention is supported by 2 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T27 (SEQ IDNO:81). Table 689 below described the starting and ending position ofthis segment on each transcript.

TABLE 689 Segment location on transcripts Segment Segment startingending Transcript name position position M78076_PEA_1_T27 (SEQ ID NO:81) 1490 3132

Segment cluster M78076_PEA_(—)1_node_(—)32 (SEQ ID NO:667) according tothe present invention is supported by 2 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T26 (SEQ ID NO:80)and M78076_PEA_(—)1_T27 (SEQ ID NO:81). Table 690 below describes thestarting and ending position of this segment on each transcript.

TABLE 690 Segment location on transcripts Segment Segment startingending Transcript name position position M78076_PEA_1_T26 (SEQ ID NO:80) 1586 2457 M78076_PEA_1_T27 (SEQ ID NO: 81) 3233 4104

Segment cluster M78076_PEA_(—)1_node_(—)35 (SEQ ID NO:668) according tothe present invention is supported by 4 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T2 (SEQ ID NO:74)and M78076_PEA_(—)1_T5 (SEQ ID NO:76). Table 691 below describes thestarting and ending position of this segment on each transcript.

TABLE 691 Segment location on transcripts Segment Segment startingending Transcript name position position M78076_PEA_1_T2 (SEQ ID NO: 74)1694 1952 M78076_PEA_1_T5 (SEQ ID NO: 76) 1694 1952

Segment cluster M78076_PEA_(—)1_node_(—)37 (SEQ ID NO:669) according tothe present invention is supported by 11 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T3 (SEQ ID NO:75)and M78076_PEA_(—)1_T5 (SEQ ID NO:76). Table 692 below describes thestarting and ending position of this segment on each transcript.

TABLE 692 Segment location on transcripts Segment Segment startingending Transcript name position position M78076_PEA_1_T3 (SEQ ID NO: 75)1718 2180 M78076_PEA_1_T5 (SEQ ID NO: 76) 1977 2439

Segment cluster M78076_PEA_(—)1_node_(—)46 (SEQ ID NO:670) according tothe present invention is supported by 3 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T15 (SEQ IDNO:78). Table 693 below describes the starting and ending position ofthis segment on each transcript.

TABLE 693 Segment location on transcripts Segment Segment startingending Transcript name position position M78076_PEA_1_T15 (SEQ ID NO:78) 1852 1972

Segment cluster M78076_PEA_(—)1_node_(—)47 (SEQ ID NO:671) according tothe present invention is supported by 155 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T2 (SEQ ID NO:74),M78076_PEA_(—)1_T3 (SEQ ID NO:75), M78076PEA_(—)1_T5 (SEQ ID NO:76),M78076_PEA_(—)1_T13 (SEQ ID NO:77), M78076_PEA_(—)1_T15 (SEQ ID NO:78)and M78076_PEA_(—)1_T23 (SEQ ID NO:79). Table 694 below describes thestarting and ending position of this segment on each transcript.

TABLE 694 Segment location on transcripts Segment Segment endingTranscript name starting position position M78076_PEA_1_T2 (SEQ ID NO:74) 2111 2254 M78076_PEA_1_T3 (SEQ ID NO: 75) 2327 2470 M78076_PEA_1_T5(SEQ ID NO: 76) 2586 2729 M78076_PEA_1_T13 (SEQ ID NO: 77) 1781 1924M78076_PEA_1_T15 (SEQ ID NO: 78) 1973 2116 M78076_PEA_1_T23 (SEQ ID NO:79) 1693 1836

Segment cluster M78076_PEA_(—)1_node_(—)54 (SEQ ID NO:672) according tothe present invention is supported by 133 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T2 (SEQ ID NO:74),M78076_PEA_(—)1_T3 (SEQ ID NO:75), M78076_PEA_(—)1_T5 (SEQ ID NO:76),M78076_PEA_(—)1_T13 (SEQ ID NO:77), M78076_PEA_(—)1_T15 (SEQ ID NO:78),M78076_PEA_(—)1_T23 (SEQ ID NO:79) and M78076_PEA_(—)1_T28 (SEQ IDNO:82). Table 695 below describes the starting and ending position ofthis segment on each transcript.

TABLE 695 Segment location on transcripts Segment Segment endingTranscript name starting position position M78076_PEA_1_T2 (SEQ ID NO:74) 2412 2715 M78076_PEA_1_T3 (SEQ ID NO: 75) 2628 2931 M78076_PEA_1_T5(SEQ ID NO: 76) 2887 3190 M78076_PEA_1_T13 (SEQ ID NO: 77) 2082 2385M78076_PEA_1_T15 (SEQ ID NO: 78) 2274 2577 M78076_PEA_1_T23 (SEQ ID NO:79) 1994 2297 M78076_PEA_1_T28 (SEQ ID NO: 82) 1492 1795

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 bp in length, and so are included in a separatedescription.

Segment cluster M78076_PEA_(—)1_node_(—)1 (SEQ ID NO:673) according tothe present invention is supported by 47 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T2 (SEQ ID NO:74),M78076_PEA_(—)1_T3 (SEQ ID NO:75), M78076_PEA_(—)1_T5 (SEQ ID NO:76),M78076_PEA_(—)1_T13 (SEQ ID NO:77), M78076_PEA_(—)1_T15 (SEQ ID NO:78),M78076_PEA_(—)1_T23 (SEQ ID NO:79), M78076_PEA_(—)1_T26 (SEQ ID NO:80),M78076_PEA_(—)1_T27 (SEQ ID NO:81) and M78076_PEA_(—)1_T28 (SEQ IDNO:82). Table 696 below describes the starting and ending position ofthis segment on each transcript.

TABLE 696 Segment location on transcripts Segment Segment endingTranscript name starting position position M78076_PEA_1_T2 (SEQ ID NO:74) 161 204 M78076_PEA_1_T3 (SEQ ID NO: 75) 161 204 M78076_PEA_1_T5 (SEQID NO: 76) 161 204 M78076_PEA_1_T13 (SEQ ID NO: 77) 161 204M78076_PEA_1_T15 (SEQ ID NO: 78) 161 204 M78076_PEA_1_T23 (SEQ ID NO:79) 161 204 M78076_PEA_1_T26 (SEQ ID NO: 80) 161 204 M78076_PEA_1_T27(SEQ ID NO: 81) 161 204 M78076_PEA_1_T28 (SEQ ID NO: 82) 161 204

Segment cluster M78076_PEA_(—)1_node_(—)2 (SEQ ID NO:674) according tothe present invention can be found in the following transcript(s):M78076_PEA_(—)1_T2 (SEQ ID NO:74), M78076_PEA_(—)1_T3 (SEQ ID NO:75),M78076_PEA_(—)1_T5 (SEQ ID NO:76), M78076_PEA_(—)1_T13 (SEQ ID NO:77),M78076_PEA_(—)1_T15 (SEQ ID NO:78), M78076_PEA_(—)1_T23 (SEQ ID NO:79),M78076_PEA_(—)1_T26 (SEQ ID NO:80), M78076_PEA_(—)1_T27 (SEQ ID NO:81)and M78076_PEA_(—)1_T28 (SEQ ID NO:82). Table 697 below describes thestarting and ending position of this segment on each transcript.

TABLE 697 Segment location on transcripts Segment Segment endingTranscript name starting position position M78076_PEA_1_T2 (SEQ ID NO:74) 205 224 M78076_PEA_1_T3 (SEQ ID NO: 75) 205 224 M78076_PEA_1_T5 (SEQID NO: 76) 205 224 M78076_PEA_1_T13 (SEQ ID NO: 77) 205 224M78076_PEA_1_T15 (SEQ ID NO: 78) 205 224 M78076_PEA_1_T23 (SEQ ID NO:79) 205 224 M78076_PEA_1_T26 (SEQ ID NO: 80) 205 224 M78076_PEA_1_T27(SEQ ID NO: 81) 205 224 M78076_PEA_1_T28 (SEQ ID NO: 82) 205 224

Segment cluster M78076_PEA_(—)1_node_(—)3 (SEQ ID NO:675) according tothe present invention is supported by 52 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T2 (SEQ ID NO:74),M78076_PEA_(—)1_T3 (SEQ ID NO:75), M78076_PEA_(—)1_T5 (SEQ ID NO:76),M78076_PEA_(—)1_T13 (SEQ ID NO:77), M78076_PEA_(—)1_T15 (SEQ ID NO:78),M78076_PEA_(—)1_T23 (SEQ ID NO:79), M78076_PEA_(—)1_T26 (SEQ ID NO:80),M78076_PEA_(—)1_T27 (SEQ ID NO:81) and M78076_PEA_(—)1_T28 (SEQ IDNO:82). Table 698 below describes the starting and ending position ofthis segment on each transcript.

TABLE 698 Segment location on transcripts Segment Segment endingTranscript name starting position position M78076_PEA_1_T2 (SEQ ID NO:74) 225 288 M78076_PEA_1_T3 (SEQ ID NO: 75) 225 288 M78076_PEA_1_T5 (SEQID NO: 76) 225 288 M78076_PEA_1_T13 (SEQ ID NO: 77) 225 288M78076_PEA_1_T15 (SEQ ID NO: 78) 225 288 M78076_PEA_1_T23 (SEQ ID NO:79) 225 288 M78076_PEA_1_T26 (SEQ ID NO: 80) 225 288 M78076_PEA_1_T27(SEQ ID NO: 81) 225 288 M78076_PEA_1_T28 (SEQ ID NO: 82) 225 288

Segment cluster M78076_PEA_(—)1_node_(—)6 (SEQ ID NO:676) according tothe present invention is supported by 59 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T2 (SEQ ID NO:74),M78076_PEA_(—)1_T3 (SEQ ID NO:75), M78076_PEA_(—)1_T5 (SEQ ID NO:76),M78076_PEA_(—)1_T13 (SEQ ID NO:77), M78076_PEA_(—)1_T15 (SEQ ID NO:78),M78076_PEA_(—)1_T23 (SEQ ID NO:79), M78076_PEA_(—)1_T26 (SEQ ID NO:80),M78076_PEA_(—)1_T27 (SEQ ID NO:81) and M78076_PEA_(—)1_T28 (SEQ IDNO:82). Table 699 below describes the starting and ending position ofthis segment on each transcript.

TABLE 699 Segment location on transcripts Segment Segment endingTranscript name starting position position M78076_PEA_1_T2 (SEQ ID NO:74) 289 370 M78076_PEA_1_T3 (SEQ ID NO: 75) 289 370 M78076_PEA_1_T5 (SEQID NO: 76) 289 370 M78076_PEA_1_T13 (SEQ ID NO: 77) 289 370M78076_PEA_1_T15 (SEQ ID NO: 78) 289 370 M78076_PEA_1_T23 (SEQ ID NO:79) 289 370 M78076_PEA_1_T26 (SEQ ID NO: 80) 289 370 M78076_PEA_1_T27(SEQ ID NO: 81) 289 370 M78076_PEA_1_T28 (SEQ ID NO: 82) 289 370

Segment cluster M78076_PEA_(—)1_node_(—)7 (SEQ ID NO:677) according tothe present invention is supported by 64 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T2 (SEQ ID NO:74),M78076_PEA_(—)1_T3 (SEQ ID NO:75), M78076_PEA_(—)1_T5 (SEQ ID NO:76),M78076_PEA_(—)1_T13 (SEQ ID NO:77), M78076_PEA_(—)1_T15 (SEQ ID NO:78),M78076_PEA_(—)1_T23 (SEQ ID NO:79), M78076_PEA_(—)1_T26 (SEQ ID NO:80),M78076_PEA_(—)1_T27 (SEQ ID NO:81) and M78076_PEA_(—)1_T28 (SEQ IDNO:82). Table 700 below describes the starting and ending position ofthis segment on each transcript.

TABLE 700 Segment location on transcripts Segment Segment endingTranscript name starting position position M78076_PEA_1_T2 (SEQ ID NO:74) 371 432 M78076_PEA_1_T3 (SEQ ID NO: 75) 371 432 M78076_PEA_1_T5 (SEQID NO: 76) 371 432 M78076_PEA_1_T13 (SEQ ID NO: 77) 371 432M78076_PEA_1_T15 (SEQ ID NO: 78) 371 432 M78076_PEA_1_T23 (SEQ ID NO:79) 371 432 M78076_PEA_1_T26 (SEQ ID NO: 80) 371 432 M78076_PEA_1_T27(SEQ ID NO: 81) 371 432 M78076_PEA_1_T28 (SEQ ID NO: 82) 371 432

Segment cluster M78076_PEA_(—)1_node_(—)12 (SEQ ID NO:678) according tothe present invention is supported by 71 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T2 (SEQ ID NO:74),M78076_PEA_(—)1_T3 (SEQ ID NO:75), M78076_PEA_(—)1_T5 (SEQ ID NO:76),M78076_PEA_(—)1_T13 (SEQ ID NO:77), M78076_PEA_(—)1_T15 (SEQ ID NO:78),M78076_PEA_(—)1_T23 (SEQ ID NO:79), M78076_PEA_(—)1_T26 (SEQ ID NO:80),M78076_PEA_(—)1_T27 (SEQ ID NO:81) and M78076_PEA_(—)1_T28 (SEQ IDNO:82). Table 701 below describes the starting and ending position ofthis segment on each transcript.

TABLE 701 Segment location on transcripts Segment Segment endingTranscript name starting position position M78076_PEA_1_T2 (SEQ ID NO:74) 566 678 M78076_PEA_1_T3 (SEQ ID NO: 75) 566 678 M78076_PEA_1_T5 (SEQID NO: 76) 566 678 M78076_PEA_1_T13 (SEQ ID NO: 77) 566 678M78076_PEA_1_T15 (SEQ ID NO: 78) 566 678 M78076_PEA_1_T23 (SEQ ID NO:79) 566 678 M78076_PEA_1_T26 (SEQ ID NO: 80) 566 678 M78076_PEA_1_T27(SEQ ID NO: 81) 566 678 M78076_PEA_1_T28 (SEQ ID NO: 82) 566 678

Segment cluster M78076_PEA_(—)1_node_(—)22 (SEQ ID NO:679) according tothe present invention is supported by 92 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T2 (SEQ ID NO:74),M78076_PEA_(—)1_T3 (SEQ ID NO:75), M78076_PEA_(—)1_T5 (SEQ ID NO:76),M78076_PEA_(—)1_T13 (SEQ ID NO:77), M78076_PEA_(—)1_T15 (SEQ ID NO:78),M78076_PEA_(—)1_T23 (SEQ ID NO:79), M78076_PEA_(—)1_T26 (SEQ ID NO:80),M78076_PEA_(—)1_T27 (SEQ ID NO:81) and M78076_PEA_(—)1_T28 (SEQ IDNO:82). Table 702 below describes the starting and ending position ofthis segment on each transcript.

TABLE 702 Segment location on transcripts Segment Segment endingTranscript name starting position position M78076_PEA_1_T2 (SEQ ID NO:74) 1123 1197 M78076_PEA_1_T3 (SEQ ID NO: 75) 1123 1197 M78076_PEA_1_T5(SEQ ID NO: 76) 1123 1197 M78076_PEA_1_T13 (SEQ ID NO: 77) 1123 1197M78076_PEA_1_T15 (SEQ ID NO: 78) 1123 1197 M78076_PEA_1_T23 (SEQ ID NO:79) 1123 1197 M78076_PEA_1_T26 (SEQ ID NO: 80) 1123 1197M78076_PEA_1_T27 (SEQ ID NO: 81) 1123 1197 M78076_PEA_1_T28 (SEQ ID NO:82) 1123 1197

Segment cluster M78076_PEA_(—)1_node_(—)27 (SEQ ID NO:680) according tothe present invention can be found in the following transcript(s):M78076_PEA_(—)1_T27 (SEQ ID NO:81). Table 703 below describes thestarting and ending position of this segment on each transcript.

TABLE 703 Segment location on transcripts Segment Segment Transcriptname starting position ending position M78076_PEA_1_T27 1486 1489 (SEQID NO: 81)

Segment cluster M78076_PEA_(—)1_node_(—)30 (SEQ ID NO:681) according tothe present invention is supported by 90 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T2 (SEQ ID NO:74),M78076_PEA_T3 (SEQ ID NO:75), M78076_PEA_(—)1_T5 (SEQ ID NO:76),M78076_PEA_(—)1_T13 (SEQ ID NO:77), M78076_PEA_(—)1_T15 (SEQ ID NO:78),M78076_PEA_(—)1_T23 (SEQ ID NO:79), M78076_PEA_(—)1_T26 (SEQ ID NO:80)and M78076_PEA_(—)1_T27 (SEQ ID NO:81). Table 704 below describes thestarting and ending position of this segment on each transcript.

TABLE 704 Segment location on transcripts Segment Segment endingTranscript name starting position position M78076_PEA_1_T2 (SEQ ID NO:74) 1486 1557 M78076_PEA_1_T3 (SEQ ID NO: 75) 1486 1557 M78076_PEA_1_T5(SEQ ID NO: 76) 1486 1557 M78076_PEA_1_T13 (SEQ ID NO: 77) 1486 1557M78076_PEA_1_T15 (SEQ ID NO: 78) 1486 1557 M78076_PEA_1_T23 (SEQ ID NO:79) 1327 1398 M78076_PEA_1_T26 (SEQ ID NO: 80) 1486 1557M78076_PEA_1_T27 (SEQ ID NO: 81) 3133 3204

Segment cluster M78076_PEA_(—)1_node_(—)31 (SEQ ID NO:682) according tothe present invention is supported by 89 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T2 (SEQ ID NO:74),M78076_PEA_(—)1_T3 (SEQ ID NO:75), M78076_PEA_(—)1_T5 (SEQ ID NO:76),M78076_PEA_(—)1_T13 (SEQ ID NO:77), M78076_PEA_(—)1_T15 (SEQ ID NO:78),M78076_PEA_(—)1_T23 (SEQ ID NO:79), M78076_PEA_(—)1_T26 (SEQ ID NO:80)and M78076PEA_(—)1_T27 (SEQ ID NO:81). Table 705 below describes thestarting and ending position of this segment on each transcript.

TABLE 705 Segment location on transcripts Segment Segment endingTranscript name starting position position M78076_PEA_1_T2 (SEQ ID NO:74) 1558 1585 M78076_PEA_1_T3 (SEQ ID NO: 75) 1558 1585 M78076_PEA_1_T5(SEQ ID NO: 76) 1558 1585 M78076_PEA_1_T13 (SEQ ID NO: 77) 1558 1585M78076_PEA_1_T15 (SEQ ID NO: 78) 1558 1585 M78076_PEA_1_T23 (SEQ ID NO:79) 1399 1426 M78076_PEA_1_T26 (SEQ ID NO: 80) 1558 1585M78076_PEA_1_T27 (SEQ ID NO: 81) 3205 3232

Segment cluster M78076_PEA_(—)1_node_(—)34 (SEQ ID NO:683) according tothe present invention is supported by 103 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T2 (SEQ ID NO:74),M78076_PEA_PEA_(—)1_T3 (SEQ ID NO:75), M78076_PEA_(—)1_T5 (SEQ IDNO:76), M78076_PEA_(—)1_T13 (SEQ ID NO:77), M78076_PEA_(—)1_T15 (SEQ IDNO:78) and M78076_PEA_(—)1_T23 (SEQ ID NO:79). Table 706 below describesthe starting and ending position of this segment on each transcript.

TABLE 706 Segment location on transcripts Segment Segment endingTranscript name starting position position M78076_PEA_1_T2 (SEQ ID NO:74) 1586 1693 M78076_PEA_1_T3 (SEQ ID NO: 75) 1586 1693 M78076_PEA_1_T5(SEQ ID NO: 76) 1586 1693 M78076_PEA_1_T13 (SEQ ID NO: 77) 1586 1693M78076_PEA_1_T15 (SEQ ID NO: 78) 1586 1693 M78076_PEA_1_T23 (SEQ ID NO:79) 1427 1534

Segment cluster M78076_PEA_(—)1_node_(—)36 (SEQ ID NO:684) according tothe present invention can be found in the following transcript(s):M78076_PEA_(—)1_T2 (SEQ ID NO:74), M78076_PEA_(—)1_T3 (SEQ ID NO:75),M78076_PEA_(—)1_T5 (SEQ ID NO:76), M78076_PEA_(—)1_T13 (SEQ ID NO:77),M78076_PEA_(—)1_T15 (SEQ ID NO:78) and M78076_PEA_(—)1_T23 (SEQ IDNO:79). Table 707 below describes the starting and ending position ofthis segment on each transcript.

TABLE 707 Segment location on transcripts Segment Segment endingTranscript name starting position position M78076_PEA_1_T2 (SEQ ID NO:74) 1953 1976 M78076_PEA_1_T3 (SEQ ID NO: 75) 1694 1717 M78076_PEA_1_T5(SEQ ID NO: 76) 1953 1976 M78076_PEA_1_T13 (SEQ ID NO: 77) 1694 1717M78076_PEA_1_T15 (SEQ ID NO: 78) 1694 1717 M78076_PEA_1_T23 (SEQ ID NO:79) 1535 1558

Segment cluster M78076_PEA_(—)1_node_(—)41 (SEQ ID NO:685) according tothe present invention can be found in the following transcript(s):M78076_PEA_(—)1_T3 (SEQ ID NO:75) and M78076_PEA_(—)1_T5 (SEQ ID NO:76).Table 708 below describes the starting and ending position of thissegment on each transcript.

TABLE 708 Segment location on transcripts Segment Segment endingTranscript name starting position position M78076_PEA_1_T3 (SEQ ID NO:75) 2181 2192 M78076_PEA_1_T5 (SEQ ID NO: 76) 2440 2451

Segment cluster M78076_PEA_(—)1_node_(—)42 (SEQ ID NO:686) according tothe present invention can be found in the following transcript(s):M78076_PEA_(—)1_T2 (SEQ ID NO:74), M78076_PEA_(—)1_T3 (SEQ ID NO:75),M78076_PEA_(—)1_T5 (SEQ ID NO:76), M78076_PEA_(—)1_T15 (SEQ ID NO:78)and M78076_PEA_(—)1_T23 (SEQ ID NO:79). Table 709 below describes thestarting and ending position of this segment on each transcript.

TABLE 709 Segment location on transcripts Segment Segment endingTranscript name starting position position M78076_PEA_1_T2 (SEQ ID NO:74) 1977 1985 M78076_PEA_1_T3 (SEQ ID NO: 75) 2193 2201 M78076_PEA_1_T5(SEQ ID NO: 76) 2452 2460 M78076_PEA_1_T15 (SEQ ID NO: 78) 1718 1726M78076_PEA_1_T23 (SEQ ID NO: 79) 1559 1567

Segment cluster M78076_PEA_(—)1_node_(—)43 (SEQ ID NO:687) according tothe present invention is supported by 110 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076PEA_(—)1_T2 (SEQ ID NO:74),M78076_PEA_(—)1_T3 (SEQ ID NO:75), M78076_PEA_(—)1_T5 (SEQ ID NO:76),M78076_PEA_(—)1_T15 (SEQ ID NO:78) and M78076_PEA_(—)1_T23 (SEQ IDNO:79). Table 710 below describes the starting and ending position ofthis segment on each transcript.

TABLE 710 Segment location on transcripts Segment Segment endingTranscript name starting position position M78076_PEA_1_T2 (SEQ ID NO:74) 1986 2047 M78076_PEA_1_T3 (SEQ ID NO: 75) 2202 2263 M78076_PEA_1_T5(SEQ ID NO: 76) 2461 2522 M78076_PEA_1_T15 (SEQ ID NO: 78) 1727 1788M78076_PEA_1_T23 (SEQ ID NO: 79) 1568 1629

Microarray (chip) data is also available for this segment as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotides were found to hit this segment (in relation to lungcancer), shown in Table 711.

TABLE 711 Oligonucleotides related to this segment Oligonucleotide nameOverexpressed in cancers Chip reference M78076_0_7_0 lung malignanttumors LUN (SEQ ID NO: 232)

Segment cluster M78076_PEA_(—)1_node_(—)45 (SEQ ID NO:688) according tothe present invention is supported by 132 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T2 (SEQ ID NO:74),M78076_PEA_(—)1_T3 (SEQ ID NO:75), M78076_PEA_(—)1_T5 (SEQ ID NO:76),M78076_PEA_(—)1_T13 (SEQ ID NO:77), M78076_PEA_(—)1_T15 (SEQ ID NO:78)and M78076_PEA_(—)1_T23 (SEQ ID NO:79). Table 712 below describes thestarting and ending position of this segment on each transcript.

TABLE 712 Segment location on transcripts Segment Segment endingTranscript name starting position position M78076_PEA_1_T2 (SEQ ID NO:74) 2048 2110 M78076_PEA_1_T3 (SEQ ID NO: 75) 2264 2326 M78076_PEA_1_T5(SEQ ID NO: 76) 2523 2585 M78076_PEA_1_T13 (SEQ ID NO: 77) 1718 1780M78076_PEA_1_T15 (SEQ ID NO: 78) 1789 1851 M78076_PEA_1_T23 (SEQ ID NO:79) 1630 1692

Microarray (chip) data is also available for this segment as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotides were found to hit this segment (in relation to lungcancer), shown in Table 713.

TABLE 713 Oligonucleotides related to this segment Overexpressed ChipOligonucleotide name in cancers reference M78076_0_7_0 (SEQ ID NO: 232)lung malignant tumors LUN

Segment cluster M78076_PEA_(—)1_node_(—)49 (SEQ ID NO:689) according tothe present invention is supported by 129 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T2 (SEQ ID NO:74),M78076_PEA_(—)1_T3 (SEQ ID NO:75), M78076_PEA_(—)1_T5 (SEQ ID NO:76),M78076_PEA_(—)1_T13 (SEQ ID NO:77), M78076_PEA_(—)1_T15 (SEQ ID NO:78)and M78076_PEA_(—)1_T23 (SEQ ID NO:79). Table 714 below describes thestarting and ending position of this segment on each transcript.

TABLE 714 Segment location on transcripts Segment Segment endingTranscript name starting position position M78076_PEA_1_T2 (SEQ ID NO:74) 2255 2290 M78076_PEA_1_T3 (SEQ ID NO: 75) 2471 2506 M78076_PEA_1_T5(SEQ ID NO: 76) 2730 2765 M78076_PEA_1_T13 (SEQ ID NO: 77) 1925 1960M78076_PEA_1_T15 (SEQ ID NO: 78) 2117 2152 M78076_PEA_1_T23 (SEQ ID NO:79) 1837 1872

Segment cluster M78076_PEA_(—)1_node_(—)50 (SEQ ID NO:690) according tothe present invention is supported by 125 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T2 (SEQ ID NO:74),M78076_PEA_(—)1_T3 (SEQ ID NO:75), M78076_PEA_(—)1_T5 (SEQ ID NO:76),M78076_PEA_(—)1_T13 (SEQ ID NO:77), M78076_PEA_(—)1_T15 (SEQ ID NO:78)and M78076_PEA_(—)1_T23 (SEQ ID NO:79). Table 715 below describes thestarting and ending position of this segment on each transcript.

TABLE 715 Segment location on transcripts Segment Segment endingTranscript name starting position position M78076_PEA_1_T2 (SEQ ID NO:74) 2291 2329 M78076_PEA_1_T3 (SEQ ID NO: 75) 2507 2545 M78076_PEA_1_T5(SEQ ID NO: 76) 2766 2804 M78076_PEA_1_T13 (SEQ ID NO: 77) 1961 1999M78076_PEA_1_T15 (SEQ ID NO: 78) 2153 2191 M78076_PEA_1_T23 (SEQ ID NO:79) 1873 1911

Segment cluster M78076_PEA_(—)1_node_(—)51 (SEQ ID NO:691) according tothe present invention is supported by 123 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): M78076_PEA_(—)1_T2 (SEQ ID NO:74),M78076_PEA_(—)1_T3 (SEQ ID NO:75), M78076_PEA_(—)1_T5 (SEQ ID NO:76),M78076_PEA_(—)1_T13 (SEQ ID NO:77), M78076_PEA_(—)1_T15 (SEQ ID NO:78)and M78076_PEA_(—)1_T23 (SEQ ID NO:79). Table 716 below describes thestarting and ending position of this segment on each transcript.

TABLE 716 Segment location on transcripts Segment Segment endingTranscript name starting position position M78076_PEA_1_T2 (SEQ ID NO:74) 2330 2388 M78076_PEA_1_T3 (SEQ ID NO: 75) 2546 2604 M78076_PEA_1_T5(SEQ ID NO: 76) 2805 2863 M78076_PEA_1_T13 (SEQ ID NO: 77) 2000 2058M78076_PEA_1_T15 (SEQ ID NO: 78) 2192 2250 M78076_PEA_1_T23 (SEQ ID NO:79) 1912 1970

Segment cluster M78076_PEA_(—)1_node_(—)52 (SEQ ID NO:692) according tothe present invention can be found in the following transcript(s):M78076_PEA_(—)1_T2 (SEQ ID NO:74), M78076_PEA_(—)1_T3 (SEQ ID NO:75),M78076_PEA_(—)1_T5 (SEQ ID NO:76), M78076_PEA_(—)1_T13 (SEQ ID NO:77),M78076_PEA_(—)1_T15 (SEQ ID NO:78) and M78076_PEA_(—)1_T23 (SEQ IDNO:79). Table 717 below describes the starting and ending position ofthis segment on each transcript.

TABLE 717 Segment location on transcripts Segment Segment endingTranscript name starting position position M78076_PEA_1_T2 (SEQ ID NO:74) 2389 2405 M78076_PEA_1_T3 (SEQ ID NO: 75) 2605 2621 M78076_PEA_1_T5(SEQ ID NO: 76) 2864 2880 M78076_PEA_1_T13 (SEQ ID NO: 77) 2059 2075M78076_PEA_1_T15 (SEQ ID NO: 78) 2251 2267 M78076_PEA_1_T23 (SEQ ID NO:79) 1971 1987

Segment cluster M78076_PEA_(—)1_node_(—)53 (SEQ ID NO:693) according tothe present invention can be found in the following transcript(s):M78076_PEA_(—)1_T2 (SEQ ID NO:74), M78076_PEA_(—)1_T3 (SEQ ID NO:75),M78076_PEA_(—)1_T5 (SEQ ID NO:76), M78076_PEA_(—)1_T13 (SEQ ID NO:77),M78076_PEA_(—)1_T15 (SEQ ID NO:78), M78076_PEA_(—)1_T23 (SEQ ID NO:79)and M78076_PEA_(—)1_T28 (SEQ ID NO:82). Table 718 below describes thestarting and ending position of this segment on each transcript.

TABLE 718 Segment location on transcripts Segment Segment endingTranscript name starting position position M78076_PEA_1_T2 (SEQ ID NO:74) 2406 2411 M78076_PEA_1_T3 (SEQ ID NO: 75) 2622 2627 M78076_PEA_1_T5(SEQ ID NO: 76) 2881 2886 M78076_PEA_1_T13 (SEQ ID NO: 77) 2076 2081M78076_PEA_1_T15 (SEQ ID NO: 78) 2268 2273 M78076_PEA_1_T23 (SEQ ID NO:79) 1988 1993 M78076_PEA_1_T28 (SEQ ID NO: 82) 1486 1491Variant protein alignment to the previously known protein:

Sequence name: APP1_HUMAN (SEQ ID NO: 1439) Sequence documentation:Alignment of: M78076_PEA_1_P3 (SEQ ID NO: 1350) × APP1_HUMAN (SEQ ID NO:1439) Alignment segment 1/1: Quality: 5132.00 Escore: 0 Matching length:517 Total length: 517 Matching Percent Similarity: 100.00 MatchingPercent Identity: 100.00 Total Percent Similarity: 100.00 Total PercentIdentity: 100.00 Gaps: 0 Alignment:

Sequence name: APP1_HUMAN (SEQ ID NO: 1439) Sequence documentation:Alignment of: M78076_PEA_1_P4 (SEQ ID NO: 1351) × APP1_HUMAN (SEQ ID NO:1439 1439) Alignment segment 1/1: Quality: 5223.00 Escore: 0 Matchinglength: 526 Total length: 526 Matching Percent Similarity: 100.00Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 TotalPercent Identity: 100.00 Gaps: 0 Alignment:

Sequence name: APP1_HUMAN (SEQ ID NO: 1439) Sequence documentation:Alignment of: M78076_PEA_1_P12 (SEQ ID NO: 1352) × APP1_HUMAN (SEQ IDNO: 1439) . . . Alignment segment 1/1: Quality: 5223.00 Escore: 0Matching length: 526 Total length: 526 Matching Percent Similarity:100.00 Matching Percent Identity: 100.00 Total Percent Similarity:100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:

Sequence name: APP1_HUMAN (SEQ ID NO: 1439) Sequence documentation:Alignment of: M78076_PEA_1_P14 (SEQ ID NO: 1353) × APP1_HUMAN (SEQ IDNO: 1439) . . . Alignment segment 1/1: Quality: 5672.00 Escore: 0Matching length: 575 Total length: 575 Matching Percent Similarity:99.48 Matching Percent Identity: 99.48 Total Percent Similarity: 99.48Total Percent Identity: 99.48 Gaps: 0 Alignment:

Sequence name: APP1_HUMAN (SEQ ID NO: 1439) Sequence documentation:Alignment of: M78076_PEA_1_P21 (SEQ ID NO: 1354) × APP1_HUMAN (SEQ IDNO: 1439) . . . Alignment segment 1/1: Quality: 5822.00 Escore: 0Matching length: 597 Total length: 650 Matching Percent Similarity:100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 91.85Total Percent Identity: 91.85 Gaps: 1 Alignment:

Sequence name: APP1_HUMAN (SEQ ID NO: 1439) Sequence documentation:Alignment of: M78076_PEA_1_P24 (SEQ ID NO: 1355) × APP1_HUMAN (SEQ IDNO: 1439) . . . Alignment segment 1/1: Quality: 4791.00 Escore: 0Matching length: 485 Total length: 485 Matching Percent Similarity:99.79 Matching Percent Identity: 99.59 Total Percent Similarity: 99.79Total Percent Identity: 99.59 Gaps: 0 Alignment:

Sequence name: APP1_HUMAN (SEQ ID NO: 1439) Sequence documentation:Alignment of: M78076_PEA_1_P2 (SEQ ID NO: 1356) × APP1_HUMAN (SEQ ID NO:1439) Alignment segment 1/1: Quality: 4474.00 Escore: 0 Matching length:454 Total length: 454 Matching Percent Similarity: 99.56 MatchingPercent Identity: 99.34 Total Percent Similarity: 99.56 Total PercentIdentity: 99.34 Gaps: 0 Alignment:

Sequence name: APP1_HUMAN (SEQ ID NO: 1439) Sequence documentation:Alignment of: M78076_PEA_1_P25 (SEQ ID NO: 1357) × APP1_HUMAN (SEQ IDNO: 1439) . . . Alignment segment 1/1: Quality: 4455.00 Escore: 0Matching length: 448 Total length: 448 Matching Percent Similarity:100.00 Matching Percent Identity: 100.00 Total Percent Similarity:100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:

Description for Cluster T99080

Cluster T99080 features 14 transcript(s) and 11 segment(s) of interest,the names for which are given in Tables 719 and 720, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 721.

TABLE 719 Transcripts of interest Transcript Name Sequence ID No.T99080_PEA_4_T0 83 T99080_PEA_4_T2 84 T99080_PEA_4_T4 85 T99080_PEA_4_T686 T99080_PEA_4_T9 87 T99080_PEA_4_T10 88 T99080_PEA_4_T11 89T99080_PEA_4_T13 90 T99080_PEA_4_T14 91 T99080_PEA_4_T17 92T99080_PEA_4_T18 93 T99080_PEA_4_T19 94 T99080_PEA_4_T20 95T99080_PEA_4_T21 96

TABLE 720 Segments of interest Segment Name Sequence ID No.T99080_PEA_4_node_1 695 T99080_PEA_4_node_6 696 T99080_PEA_4_node_11 697T99080_PEA_4_node_19 698 T99080_PEA_4_node_20 699 T99080_PEA_4_node_3700 T99080_PEA_4_node_5 701 T99080_PEA_4_node_8 702 T99080_PEA_4_node_13703 T99080_PEA_4_node_15 704 T99080_PEA_4_node_18 705

TABLE 721 Proteins of interest Sequence Protein Name ID No.Corresponding Transcript(s) T99080_PEA_4_P1 1358 T99080_PEA_4_T0 (SEQ IDNO: 83) T99080_PEA_4_P2 1359 T99080_PEA_4_T2 (SEQ ID NO: 84)T99080_PEA_4_P5 1360 T99080_PEA_4_T6 (SEQ ID NO: 86) T99080_PEA_4_P81361 T99080_PEA_4_T9 (SEQ ID NO: 87) T99080_PEA_4_P9 1362T99080_PEA_4_T10 (SEQ ID NO: 88) T99080_PEA_4_P10 1363 T99080_PEA_4_T11(SEQ ID NO: 89) T99080_PEA_4_P12 1364 T99080_PEA_4_T14 (SEQ ID NO: 91)T99080_PEA_4_P13 1365 T99080_PEA_4_T17 (SEQ ID NO: 92) T99080_PEA_4_P141366 T99080_PEA_4_T18 (SEQ ID NO: 93) T99080_PEA_4_P15 1367T99080_PEA_4_T19 (SEQ ID NO: 94) T99080_PEA_4_P16 1368 T99080_PEA_4_T20(SEQ ID NO: 95) T99080_PEA_4_P17 1369 T99080_PEA_4_T21 (SEQ ID NO: 96)

These sequences are variants of the known protein Acylphosphatase,organ-common type isozyme (SwissProt accession identifier ACYO_HUMAN;known also according to the synonyms EC 3.6.1.7; Acylphosphatephosphohydrolase; Acylphosphatase, erythrocyte isozyme), SEQ ID NO:1440, referred to herein as the previously known protein.

The sequence for protein Acylphosphatase (SEQ ID NO:1440), organ-commontype isozyme is given at the end of the application, as“Acylphosphatase, organ-common type isozyme amino acid sequence”. Knownpolymorphisms for this sequence are as shown in Table 722.

TABLE 722 Amino acid mutations for Known Protein SNP position(s) onamino acid sequence Comment 19 G -> R

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: phosphate metabolism, which areannotation(s) related to Biological Process; and acylphosphatase, whichare annotation(s) related to Molecular Function.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nihdotgov/projects/LocusLink/>.

As noted above, cluster T99080 features 14 transcript(s), which werelisted in Table 719 above. These transcript(s) encode for protein(s)which are variant(s) of protein Acylphosphatase (SEQ ID NO:1440),organ-common type isozyme. A description of each variant proteinaccording to the present invention is now provided.

Variant protein T99080_PEA_(—)4_P1 (SEQ ID NO:1358) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) T99080_PEA_(—)4_T0 (SEQ IDNO:83). The location of the variant protein was determined according toresults from a number of different software programs and analyses,including analyses from SignalP and other specialized programs. Thevariant protein is believed to be located as follows with regard to thecell: secreted. The protein localization is believed to be secretedbecause both signal-peptide prediction programs predict that thisprotein has a signal peptide, and neither trans-membrane regionprediction program predicts that this protein has a trans-membraneregion.

Variant protein T99080_PEA_(—)4_P1 (SEQ ID NO:1358) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 723, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein T99080_PEA_(—)4_P1 (SEQ ID NO:1358) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 723 Amino acid mutations SNP position(s) on amino acid Alternativesequence amino acid(s) Previously known SNP? 23 A -> V Yes

Variant protein T99080_PEA_(—)4_P1 (SEQ ID NO:1358) is encoded by thefollowing transcript(s): T99080_PEA_(—)4_T0 (SEQ ID NO:83), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript T99080_PEA_(—)4_T0 (SEQ ID NO:83) is shown inbold; this coding portion starts at position 226 and ends at position411. The transcript also has the following SNPs as listed in Table 724(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinT99080_PEA_(—)4_P1 (SEQ ID NO:1358) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 724 Nucleic acid SNPs SNP position on nucleotide AlternativePreviously sequence nucleic acid known SNP? 293 C -> T Yes 1293 G -> CYes 2034 A -> G Yes 2114 A -> C Yes 2153 -> A No

Variant protein T99080_PEA_(—)4_P2 (SEQ ID NO:1359) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) T99080_PEA_(—)4_T2 (SEQ IDNO:84). The location of the variant protein was determined according toresults from a number of different software programs and analyses,including analyses from SignalP and other specialized programs. Thevariant protein is believed to be located as follows with regard to thecell: membrane. The protein localization is believed to be membranebecause although it is a partial protein, because both trans-membraneregion prediction programs predict that this protein has atrans-membrane region.

Variant protein T99080_PEA_(—)4_P2 (SEQ ID NO:1359) is encoded by thefollowing transcript(s): T99080_PEA_(—)4_T2 (SEQ ID NO:84), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript T99080_PEA_(—)4_T2 (SEQ ID NO:84) is shown inbold; this coding portion starts at position 1 and ends at position 192.The transcript also has the following SNPs as listed in Table 725 (givenaccording to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinT99080_PEA_(—)4_P2 (SEQ ID NO:1359) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 725 Nucleic acid SNPs SNP position on nucleotide AlternativePreviously sequence nucleic acid known SNP? 1074 G -> C Yes 1815 A -> GYes 1895 A -> C Yes 1934 -> A No

Variant protein T99080_PEA_(—)4_P5 (SEQ ID NO:1360) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) T99080_PEA_(—)4_T6 (SEQ IDNO:86). An alignment is given to the known protein (Acylphosphatase (SEQID NO:1440), organ-common type isozyme) at the end of the application.One or more alignments to one or more previously published proteinsequences are given at the end of the application. A brief descriptionof the relationship of the variant protein according to the presentinvention to each such aligned protein is as follows:

Comparison report between T99080_PEA_(—)4_P5 (SEQ ID NO:1360) andACYO_HUMAN_V1 (SEQ ID NO: 1441):

1. An isolated chimeric polypeptide encoding for T99080_PEA_(—)4_P5 (SEQID NO:1360), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence MPASARLAGAGLLLAFLRALGCAGRAPGLS (SEQ ID NO: 1732)corresponding to amino acids 1-30 of T99080_PEA_(—)4_P5 (SEQ IDNO:1360), and a second amino acid sequence being at least 90% homologousto MAEGNTLISVDYEIFGKVQGVFFRKCHTQAEGKKLGLVGWVQNTDRGTVQGQLQGPISKVRHMQEWLETRGSPKSHIDKANFNNEKVILKLDYSDFQIVK corresponding to amino acids 1-99 ofACYO_HUMAN_V1 (SEQ ID NO:1441), which also corresponds to amino acids31-129 of T99080_PEA_(—)4_P5 (SEQ ID NO:1360), wherein said first aminoacid sequence and second amino acid sequence are contiguous and in asequential order.

2. An isolated polypeptide encoding for a head of T99080_PEA_(—)4_P5(SEQ ID NO:1360), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence MPASARLAGAGLLIAFLRALGCAGRAPGLS (SEQ ID NO:1732) of T99080_PEA_(—)4_P5 (SEQ ID NO:1360).

It should be noted that the known protein sequence (ACYO_HUMAN (SEQ IDNO:1440)) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence forACYO_HUMAN_V1 (SEQ ID NO:1441). These changes were previously known tooccur and are listed in the table below.

TABLE 726 Changes to ACYO_HUMAN_V1 (SEQ ID NO: 1441) SNP position(s) onamino acid sequence Type of change 1 init_met

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein T99080_PEA_(—)4_P5 (SEQ ID NO:1360) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 727, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein T99080_PEA_(—)4_P5 (SEQ ID NO:1360) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 727 Amino acid mutations SNP position(s) on amino acid AlternativePreviously sequence amino acid(s) known SNP? 23 A -> V Yes

Variant protein T99080_PEA_(—)4_P5 (SEQ ID NO:1360) is encoded by thefollowing transcript(s): T99080_PEA_(—)4_T6 (SEQ ID NO:86), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript T99080_PEA_(—)4_T6 (SEQ ID NO:86) is shown inbold; this coding portion starts at position 226 and ends at position612. The transcript also has the following SNPs as listed in Table 728(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinT99080_PEA_(—)4_P5 (SEQ ID NO:1360) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 728 Nucleic acid SNPs SNP position on nucleotide AlternativePreviously sequence nucleic acid known SNP? 293 C -> T Yes 697 A -> GYes 777 A -> C Yes 816 -> A No

Variant protein T99080_PEA_(—)4_P8 (SEQ ID NO:1361) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) T99080_PEA_(—)4_T9 (SEQ IDNO:87). An alignment is given to the known protein (Acylphosphatase (SEQID NO:1440), organ-common type isozyme) at the end of the application.One or more alignments to one or more previously published proteinsequences are given at the end of the application. A brief descriptionof the relationship of the variant protein according to the presentinvention to each such aligned protein is as follows:

Comparison report between T99080_PEA_(—)4_P8 (SEQ ID NO:1361) andACYO_HUMAN_V1 (SEQ ID NO:1441):

1. An isolated chimeric polypeptide encoding for T99080_PEA_(—)4_P8 (SEQID NO:1361), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence M corresponding to amino acids 1-1 ofT99080_PEA_(—)4_P8 (SEQ ID NO:1361), and a second amino acid sequencebeing at least 90% homologous toQAEGKKLGLVGWVQNTDRGTVQGQLQGPISKVRHMQEWLETRGSPKSHIDKANFNNEKVILKLDYSDFQIVK corresponding to amino acids 28-99 of ACYO_HUMAN_V1 (SEQ IDNO:1441), which also corresponds to amino acids 2-73 of T99080_PEA_(—)4P8 (SEQ ID NO:1361), wherein said first amino acid sequence and secondamino acid sequence are contiguous and in a sequential order.

It should be noted that the known protein sequence (ACYO_HUMAN (SEQ IDNO:1440)) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence forACYO_HUMAN_V1 (SEQ ID NO:1441). These changes were previously known tooccur and are listed in the table below.

TABLE 729 Changes to ACYO_HUMAN_V1 (SEQ ID NO: 1441) SNP position(s) onamino acid sequence Type of change 1 init_met

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:intracellularly. The protein localization is believed to beintracellularly because neither of the trans-membrane region predictionprograms predicted a trans-membrane region for this protein. In additionboth signal-peptide prediction programs predict that this protein is anon-secreted protein.

Variant protein T99080_PEA_(—)4_P8 (SEQ ID NO:1361) is encoded by thefollowing transcript(s): T99080_PEA_(—)4_T9 (SEQ ID NO:87), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript T99080_PEA_(—)4_T9 (SEQ ID NO:87) is shown inbold; this coding portion starts at position 162 and ends at position380. The transcript also has the following SNPs as listed in Table 730(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinT99080_PEA_(—)4_P8 (SEQ ID NO:1361) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 730 Nucleic acid SNPs SNP position on nucleotide AlternativePreviously sequence nucleic acid known SNP? 465 A -> G Yes 545 A -> CYes 584 -> A No

Variant protein T99080_PEA_(—)4_P9 (SEQ ID NO:1362) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) T99080_PEA_(—)4_T10 (SEQ IDNO:88). The location of the variant protein was determined according toresults from a number of different software programs and analyses,including analyses from SignalP and other specialized programs. Thevariant protein is believed to be located as follows with regard to thecell: membrane. The protein localization is believed to be membranebecause although it is a partial protein, because both trans-membraneregion prediction programs predict that this protein has atrans-membrane region.

Variant protein T99080_PEA_(—)4_P9 (SEQ ID NO:1362) is encoded by thefollowing transcript(s): T99080_PEA_(—)4_T10 (SEQ ID NO:88), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript T99080_PEA_(—)4_T10 (SEQ ID NO:88) is shown inbold; this coding portion starts at position 1 and ends at position 261.The transcript also has the following SNPs as listed in Table 731 (givenaccording to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinT99080_PEA_(—)4_P9 (SEQ ID NO:1362) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 731 Nucleic acid SNPs SNP position on nucleotide AlternativePreviously sequence nucleic acid known SNP? 557 A -> G Yes 637 A -> CYes 676 -> A No

Variant protein T99080_PEA_(—)4_P10 (SEQ ID NO:1363) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) T99080_PEA_(—)4_T11 (SEQ IDNO:89). The location of the variant protein was determined according toresults from a number of different software programs and analyses,including analyses from SignalP and other specialized programs. Thevariant protein is believed to be located as follows with regard to thecell: membrane. The protein localization is believed to be membranebecause although it is a partial protein, because both trans-membraneregion prediction programs predict that this protein has atrans-membrane region.

Variant protein T99080_PEA_(—)4_P10 (SEQ ID NO:1363) is encoded by thefollowing transcript(s): T99080_PEA_(—)4_T11 (SEQ ID NO:89), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript T99080_PEA_(—)4_T11 (SEQ ID NO:89) is shown inbold; this coding portion starts at position 1 and ends at position 240.The transcript also has the following SNPs as listed in Table 732 (givenaccording to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinT99080_PEA_(—)4_P10 (SEQ ID NO:1363) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 732 Nucleic acid SNPs SNP position on nucleotide AlternativePreviously sequence nucleic acid known SNP? 269 G -> T Yes 592 A -> GYes 672 A -> C Yes 711 -> A No

Variant protein T99080_PEA_(—)4_P 12 (SEQ ID NO:1364) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) T99080_PEA_(—)4_T14 (SEQ IDNO:91). The location of the variant protein was determined according toresults from a number of different software programs and analyses,including analyses from SignalP and other specialized programs. Thevariant protein is believed to be located as follows with regard to thecell: membrane. The protein localization is believed to be membranebecause although it is a partial protein, because both trans-membraneregion prediction programs predict that this protein has atrans-membrane region.

Variant protein T99080_PEA_(—)4_P12 (SEQ ID NO:1364) is encoded by thefollowing transcript(s): T99080_PEA_(—)4_T14 (SEQ ID NO:91), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript T99080_PEA_(—)4_T14 (SEQ ID NO:91) is shown inbold; this coding portion starts at position 1 and ends at position 282.

Variant protein T99080_PEA_(—)4_P13 (SEQ ID NO:1365) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) T99080_PEA_(—)4_T17 (SEQ IDNO:92). The location of the variant protein was determined according toresults from a number of different software programs and analyses,including analyses from SignalP and other specialized programs. Thevariant protein is believed to be located as follows with regard to thecell: membrane. The protein localization is believed to be membranebecause although it is a partial protein, because both trans-membraneregion prediction programs predict that this protein has atrans-membrane region.

Variant protein T99080_PEA_(—)4_P13 (SEQ ID NO:1365) is encoded by thefollowing transcript(s): T99080_PEA_(—)4_T17 (SEQ ID NO:92), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript T99080_PEA_(—)4_T17 (SEQ ID NO:92) is shown inbold; this coding portion starts at position 1 and ends at position 207.

Variant protein T99080_PEA_(—)4_P14 (SEQ ID NO:1366) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) T99080_PEA_(—)4_T18 (SEQ IDNO:93). The location of the variant protein was determined according toresults from a number of different software programs and analyses,including analyses from SignalP and other specialized programs. Thevariant protein is believed to be located as follows with regard to thecell: secreted. The protein localization is believed to be secretedbecause both signal-peptide prediction programs predict that thisprotein has a signal peptide, and neither trans-membrane regionprediction program predicts that this protein has a trans-membraneregion.

Variant protein T99080_PEA_(—)4_P14 (SEQ ID NO:1366) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 733, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein T99080_PEA_(—)4_P14 (SEQ ID NO:1366) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 733 Amino acid mutations SNP position(s) on amino acid AlternativePreviously sequence amino acid(s) known SNP? 23 A -> V Yes

Variant protein T99080_PEA_(—)4_P14 (SEQ ID NO:1366) is encoded by thefollowing transcript(s): T99080_PEA4_T18 (SEQ ID NO:93), for which thesequence(s) is/are given at the end of the application. The codingportion of transcript T99080_PEA_(—)4_T18 (SEQ ID NO:93) is shown inbold; this coding portion starts at position 226 and ends at position480. The transcript also has the following SNPs as listed in Table 734(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinT99080_PEA_(—)4_P14 (SEQ ID NO:1366) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 734 Nucleic acid SNPs SNP position on nucleotide AlternativePreviously sequence nucleic acid known SNP? 293 C -> T Yes 776 A -> GYes 856 A -> C Yes 895 -> A No

Variant protein T99080_PEA_(—)4_P15 (SEQ ID NO:1367) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) T99080_PEA_(—)4_T19 (SEQ IDNO:94). The location of the variant protein was determined according toresults from a number of different software programs and analyses,including analyses from SignalP and other specialized programs. Thevariant protein is believed to be located as follows with regard to thecell: secreted. The protein localization is believed to be secretedbecause both signal-peptide prediction programs predict that thisprotein has a signal peptide, and neither trans-membrane regionprediction program predicts that this protein has a trans-membraneregion.

Variant protein T99080_PEA_(—)4_P15 (SEQ ID NO:1367) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 735, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein T99080_PEA_(—)4_P15 (SEQ ID NO:1367) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 735 Amino acid mutations SNP position(s) on amino acid AlternativePreviously sequence amino acid(s) known SNP? 23 A -> V Yes

Variant protein T99080_PEA_(—)4_P15 (SEQ ID NO:1367) is encoded by thefollowing transcript(s): T99080_PEA_(—)4_T19 (SEQ ID NO:94), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript T99080_PEA_(—)4_T19 (SEQ ID NO:94) is shown inbold; this coding portion starts at position 226 and ends at position459. The transcript also has the following SNPs as listed in Table 736(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinT99080_PEA_(—)4_P15 (SEQ ID NO:1367) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 736 Nucleic acid SNPs SNP position on nucleotide AlternativePreviously sequence nucleic acid known SNP? 293 C -> T Yes 488 G -> TYes 811 A -> G Yes 891 A -> C Yes 930 -> A No

Variant protein T99080_PEA_(—)4_P16 (SEQ ID NO:1368) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) T99080_PEA_(—)4_T20 (SEQ IDNO:95). The location of the variant protein was determined according toresults from a number of different software programs and analyses,including analyses from SignalP and other specialized programs. Thevariant protein is believed to be located as follows with regard to thecell: secreted. The protein localization is believed to be secretedbecause both signal-peptide prediction programs predict that thisprotein has a signal peptide, and neither trans-membrane regionprediction program predicts that this protein has a trans-membraneregion.

Variant protein T99080_PEA_(—)4_P16 (SEQ ID NO:1368) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 737, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein T99080_PEA_(—)4_P16 (SEQ ID NO:1368) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 737 Amino acid mutations SNP position(s) on amino acid Alternativesequence amino acid(s) Previously known SNP? 23 A -> V Yes

Variant protein T99080_PEA_(—)4_P16 (SEQ ID NO:1368) is encoded by thefollowing transcript(s): T99080_PEA_(—)4_T20 (SEQ ID NO:95), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript T99080_PEA_(—)4_T20 (SEQ ID NO:95) is shown inbold; this coding portion starts at position 226 and ends at position501. The transcript also has the following SNPs as listed in Table 738(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinT99080_PEA_(—)4_P16 (SEQ ID NO:1368) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 738 Nucleic acid SNPs SNP position on nucleotide AlternativePreviously sequence nucleic acid known SNP? 293 C -> T Yes

Variant protein T99080_PEA_(—)4_P17 (SEQ ID NO:1369) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) T99080_PEA_(—)4_T21 (SEQ IDNO:96). The location of the variant protein was determined according toresults from a number of different software programs and analyses,including analyses from SignalP and other specialized programs. Thevariant protein is believed to be located as follows with regard to thecell: secreted. The protein localization is believed to be secretedbecause both signal-peptide prediction programs predict that thisprotein has a signal peptide, and neither trans-membrane regionprediction program predicts that this protein has a trans-membraneregion.

Variant protein T99080_PEA_(—)4_P17 (SEQ ID NO:1369) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 739, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein T99080_PEA_(—)4_P17 (SEQ ID NO:1369) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 739 Amino acid mutations SNP position(s) on amino acid Alternativesequence amino acid(s) Previously known SNP? 23 A -> V Yes

Variant protein T99080_PEA_(—)4_P17 (SEQ ID NO:1369) is encoded by thefollowing transcript(s): T99080_PEA_(—)4_T21 (SEQ ID NO:96), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript T99080_PEA_(—)4_T21 (SEQ ID NO:96) is shown inbold; this coding portion starts at position 226 and ends at position426. The transcript also has the following SNPs as listed in Table 740(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinT99080_PEA_(—)4_P17 (SEQ ID NO:1369) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 740 Nucleic acid SNPs SNP position on nucleotide AlternativePreviously sequence nucleic acid known SNP? 293 C -> T Yes

As noted above, cluster T99080 features 11 segment(s), which were listedin Table 720 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster T99080_PEA_(—)4_node_(—)1 (SEQ ID NO:695) according tothe present invention is supported by 5 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T99080_PEA_(—)4_T0 (SEQ ID NO:83),T99080_PEA_(—)4_T6 (SEQ ID NO:86), T99080_PEA_(—)4_T13 (SEQ ID NO:90),T99080_PEA_(—)4_T18 (SEQ ID NO:93), T99080_PEA_(—)4_T19 (SEQ ID NO:94),T99080_PEA_(—)4_T20 (SEQ ID NO:95) and T99080_PEA_(—)4_T21 (SEQ IDNO:96). Table 741 below describes the starting and ending position ofthis segment on each transcript.

TABLE 741 Segment location on transcripts Segment Segment endingTranscript name starting position position T99080_PEA_4_T0 (SEQ ID NO:83) 1 307 T99080_PEA_4_T6 (SEQ ID NO: 86) 1 307 T99080_PEA_4_T13 (SEQ IDNO: 90) 1 307 T99080_PEA_4_T18 (SEQ ID NO: 93) 1 307 T99080_PEA_4_T19(SEQ ID NO: 94) 1 307 T99080_PEA_4_T20 (SEQ ID NO: 95) 1 307T99080_PEA_4_T21 (SEQ ID NO: 96) 1 307

Segment cluster T99080_PEA_(—)4_node_(—)6 (SEQ ID NO:696) according tothe present invention is supported by 3 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T99080_PEA_(—)4_T17 (SEQ ID NO:92)and T99080_PEA_(—)4_T21 (SEQ ID NO:96). Table 742 below describes thestarting and ending position of this segment on each transcript.

TABLE 742 Segment location on transcripts Segment Segment endingTranscript name starting position position T99080_PEA_4_T17 (SEQ ID NO:92) 181 627 T99080_PEA_4_T21 (SEQ ID NO: 96) 400 846

Segment cluster T99080_PEA_(—)4_node_l 1 (SEQ ID NO:697) according tothe present invention is supported by 7 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T99080_PEA_(—)4_T14 (SEQ ID NO:91)and T99080_PEA_(—)4_T20 (SEQ ID NO:95). Table 743 below describes thestarting and ending position of this segment on each transcript.

TABLE 743 Segment location on transcripts Segment Segment endingTranscript name starting position position T99080_PEA_4_T14 (SEQ ID NO:91) 260 782 T99080_PEA_4_T20 (SEQ ID NO: 95) 479 1001

Segment cluster T99080_PEA_(—)4_node_(—)19 (SEQ ID NO:698) according tothe present invention is supported by 59 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T99080_PEA_(—)4_T0 (SEQ ID NO:83),T99080_PEA_T2 (SEQ ID NO:84) and T99080_PEA_(—)4_T4 (SEQ ID NO:85).Table 744 below describes the starting and ending position of thissegment on each transcript.

TABLE 744 Segment location on transcripts Segment Segment startingending Transcript name position position T99080_PEA_4_T0 (SEQ ID NO: 83)449 1736 T99080_PEA_4_T2 (SEQ ID NO: 84) 230 1517 T99080_PEA_4_T4 (SEQID NO: 85) 78 1365

Segment cluster T99080_PEA_(—)4_node_(—)20 (SEQ ID NO:699) according tothe present invention is supported by 98 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T99080_PEA_(—)4_T0 (SEQ ID NO:83),T99080_PEA_(—)4_T2 (SEQ ID NO:84), T99080_PEA_(—)4_T4 (SEQ ID NO:85),T99080_PEA_(—)4_T6 (SEQ ID NO:86), T99080_PEA_(—)4_T9 (SEQ ID NO:87),T99080_PEA_(—)4_T10 (SEQ ID NO:88), T99080_PEA_(—)4_T11 (SEQ ID NO:89),T99080_PEA_(—)4_T13 (SEQ ID NO:90), T99080_PEA_(—)4_T18 (SEQ ID NO:93)and T99080_PEA_(—)4_T19 (SEQ ID NO:94). Table 745 below describes thestarting and ending position of this segment on each transcript.

TABLE 745 Segment location on transcripts Segment Segment startingending Transcript name position position T99080_PEA_4_T0 (SEQ ID NO: 83)1737 2175 T99080_PEA_4_T2 (SEQ ID NO: 84) 1518 1956 T99080_PEA_4_T4 (SEQID NO: 85) 1366 1804 T99080_PEA_4_T6 (SEQ ID NO: 86) 400 838T99080_PEA_4_T9 (SEQ ID NO: 87) 168 606 T99080_PEA_4_T10 (SEQ ID NO: 88)260 698 T99080_PEA_4_T11 (SEQ ID NO: 89) 295 733 T99080_PEA_4_T13 (SEQID NO: 90) 308 746 T99080_PEA_4_T18 (SEQ ID NO: 93) 479 917T99080_PEA_4_T19 (SEQ ID NO: 94) 514 952

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster T99080_PEA_(—)4_node_(—)3 (SEQ ID NO:700) according tothe present invention is supported by 40 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T99080_PEA_(—)4_T2 (SEQ ID NO:84),T99080_PEA_(—)4_T9 (SEQ ID NO:87), T99080_PEA_(—)4_T10 (SEQ ID NO:88),T99080_PEA_(—)4_T11 (SEQ ID NO:89), T99080_PEA_(—)4_T14 (SEQ ID NO:91)and T99080_PEA_(—)4_T17 (SEQ ID NO:92). Table 746 below describes thestarting and ending position of this segment on each transcript.

TABLE 746 Segment location on transcripts Segment Segment startingending Transcript name position position T99080_PEA_4_T2 (SEQ ID NO: 84)1 88 T99080_PEA_4_T9 (SEQ ID NO: 87) 1 88 T99080_PEA_4_T10 (SEQ ID NO:88) 1 88 T99080_PEA_4_T11 (SEQ ID NO: 89) 1 88 T99080_PEA_4_T14 (SEQ IDNO: 91) 1 88 T99080_PEA_4_T17 (SEQ ID NO: 92) 1 88

Segment cluster T99080_PEA_(—)4_node_(—)5 (SEQ ID NO:701) according tothe present invention is supported by 57 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T99080_PEA_(—)4_TO (SEQ ID NO:83),T99080_PEA_(—)4_T2 (SEQ ID NO:84), T99080_PEA_(—)4_T6 (SEQ ID NO:86),T99080_PEA_(—)4_T10 (SEQ ID NO:88), T99080_PEA_(—)4_T11 (SEQ ID NO:89),T99080_PEA_(—)4_T14 (SEQ ID NO:91), T99080_PEA_(—)4_T17 (SEQ ID NO:92),T99080_PEA_(—)4_T18 (SEQ ID NO:93), T99080_PEA_(—)4_T19 (SEQ ID NO:94),T99080_PEA_(—)4_T20 (SEQ ID NO:95) and T99080_PEA_(—)4_T21 (SEQ IDNO:96). Table 747 below describes the starting and ending position ofthis segment on each transcript.

TABLE 747 Segment location on transcripts Segment Segment startingending Transcript name position position T99080_PEA_4_T0 (SEQ ID NO: 83)308 399 T99080_PEA_4_T2 (SEQ ID NO: 84) 89 180 T99080_PEA_4_T6 (SEQ IDNO: 86) 308 399 T99080_PEA_4_T10 (SEQ ID NO: 88) 89 180 T99080_PEA_4_T11(SEQ ID NO: 89) 89 180 T99080_PEA_4_T14 (SEQ ID NO: 91) 89 180T99080_PEA_4_T17 (SEQ ID NO: 92) 89 180 T99080_PEA_4_T18 (SEQ ID NO: 93)308 399 T99080_PEA_4_T19 (SEQ ID NO: 94) 308 399 T99080_PEA_4_T20 (SEQID NO: 95) 308 399 T99080_PEA_4_T21 (SEQ ID NO: 96) 308 399

Segment cluster T99080_PEA_(—)4_node_(—)8 (SEQ ID NO:702) according tothe present invention is supported by 12 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T99080_PEA_(—)4_T9 (SEQ ID NO:87),T99080_PEA_(—)4_T10 (SEQ ID NO:88), T99080_PEA_(—)4_T14 (SEQ ID NO:91),T99080_PEA4_T18 (SEQ ID NO:93) and T99080_PEA_(—)4_T20 (SEQ ID NO:95).Table 748 below describes the starting and ending position of thissegment on each transcript.

TABLE 748 Segment location on transcripts Segment Segment startingending Transcript name position position T99080_PEA_4_T9 (SEQ ID NO: 87)89 167 T99080_PEA_4_T10 (SEQ ID NO: 88) 181 259 T99080_PEA_4_T14 (SEQ IDNO: 91) 181 259 T99080_PEA_4_T18 (SEQ ID NO: 93) 400 478T99080_PEA_4_T20 (SEQ ID NO: 95) 400 478

Microarray (chip) data is also available for this segment as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotides were found to hit this segment (in relation to lungcancer), shown in Table 749.

TABLE 749 Oligonucleotides related to this segment Oligonucleotide nameOverexpressed in cancers Chip reference T99080_0_0_58896 lung malignanttumors LUN (SEQ ID NO: 233)

Segment cluster T99080_PEA_(—)4_node_(—)13 (SEQ ID NO:703) according tothe present invention is supported by 2 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T99080_PEA_(—)4_T4 (SEQ ID NO:85).Table 750 below describes the starting and ending position of thissegment on each transcript.

TABLE 750 Segment location on transcripts Segment Segment startingending Transcript name position position T99080_PEA_4_T4 (SEQ ID NO: 85)1 77

Segment cluster T99080_PEA_(—)4_node_(—)15 (SEQ ID NO:704) according tothe present invention is supported by 6 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T99080_PEA_(—)4_T11 (SEQ ID NO:89)and T99080_PEA_(—)4_T19 (SEQ ID NO:94). Table 751 below describes thestarting and ending position of this segment on each transcript.

TABLE 751 Segment location on transcripts Segment Segment startingending Transcript name position position T99080_PEA_4_T11 (SEQ ID NO:89) 181 294 T99080_PEA_4_T19 (SEQ ID NO: 94) 400 513

Segment cluster T99080_PEA_(—)4_node_l 8 (SEQ ID NO:705) according tothe present invention is supported by 5 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T99080_PEA_(—)4_TO (SEQ ID NO:83)and T99080_PEA_(—)4_T2 (SEQ ID NO:84). Table 752 below describes thestarting and ending position of this segment on each transcript.

TABLE 752 Segment location on transcripts Segment Segment startingending Transcript name position position T99080_PEA_4_T0 (SEQ ID NO: 83)400 448 T99080_PEA_4_T2 (SEQ ID NO: 84) 181 229Variant protein alignment to the previously known protein:

Sequence name: ACYO_HUMAN_V1 (SEQ ID NO: 1441) Sequence documentation:Alignment of: T99080_PEA_4_P5 (SEQ ID NO: 1360) × ACYO_HUMAN_V1 (SEQ IDNO: 1441) . . . Alignment segment 1/1: Quality: 973.00 Escore: 0Matching length: 99 Total length: 99 Matching Percent Similarity: 100.00Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 TotalPercent Identity: 100.00 Gaps: 0 Alignment:

Sequence name: ACYO_HUMAN_V1 (SEQ ID NO: 1441) Sequence documentation:Alignment of: T99080_PEA_4_P8 (SEQ ID NO: 1361) × ACYO_HUMAN_V1 (SEQ IDNO: 1441) . . . Alignment segment 1/1: Quality: 711.00 Escore: 0Matching length: 72 Total length: 72 Matching Percent Similarity: 100.00Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 TotalPercent Identity: 100.00 Gaps: 0 Alignment:

Description for Cluster T08446

Cluster T08446 features 2 transcript(s) and 36 segment(s) of interest,the names for which are given in Tables 753 and 754, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 755.

TABLE 753 Transcripts of interest Transcript Name Sequence ID No.T08446_PEA_1_T2 97 T08446_PEA_1_T22 98

TABLE 754 Segments of interest Segment Name Sequence ID No.T08446_PEA_1_node_2 706 T08446_PEA_1_node_9 707 T08446_PEA_1_node_15 708T08446_PEA_1_node_17 709 T08446_PEA_1_node_25 710 T08446_PEA_1_node_29711 T08446_PEA_1_node_38 712 T08446_PEA_1_node_43 713T08446_PEA_1_node_51 714 T08446_PEA_1_node_52 715 T08446_PEA_1_node_55716 T08446_PEA_1_node_57 717 T08446_PEA_1_node_59 718T08446_PEA_1_node_62 719 T08446_PEA_1_node_63 720 T08446_PEA_1_node_3721 T08446_PEA_1_node_5 722 T08446_PEA_1_node_7 723 T08446_PEA_1_node_12724 T08446_PEA_1_node_13 725 T08446_PEA_1_node_19 726T08446_PEA_1_node_21 727 T08446_PEA_1_node_23 728 T08446_PEA_1_node_27729 T08446_PEA_1_node_32 730 T08446_PEA_1_node_34 731T08446_PEA_1_node_45 732 T08446_PEA_1_node_46 733 T08446_PEA_1_node_48734 T08446_PEA_1_node_54 735 T08446_PEA_1_node_58 736T08446_PEA_1_node_60 737 T08446_PEA_1_node_61 738 T08446_PEA_1_node_64739 T08446_PEA_1_node_65 740 T08446_PEA_1_node_66 741

TABLE 755 Proteins of interest Protein Name Sequence ID No.Corresponding Transcript(s) T08446_PEA_1_P18 1370 T08446_PEA_1_T2 (SEQID NO: 97) T08446_PEA_1_P19 1371 T08446_PEA_1_T22 (SEQ ID NO: 98)

These sequences are variants of the known protein Sorting nexin 26(SwissProt accession identifier SNXQ_HUMAN), SEQ ID NO: 1442, referredto herein as the previously known protein.

Protein Sorting nexin 26 (SEQ ID NO:1442) is known or believed to havethe following function(s): May be involved in several stages ofintracellular trafficking (By similarity). The sequence for proteinSorting nexin 26 is given at the end of the application, as “Sortingnexin 26 amino acid sequence”.

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: intracellular protein traffic,which are annotation(s) related to Biological Process; and proteintransporter, which are annotation(s) related to Molecular Function.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

As noted above, cluster T08446 features 2 transcript(s), which werelisted in Table 753 above. These transcript(s) encode for protein(s)which are variant(s) of protein Sorting nexin 26 (SEQ ID NO:1442). Adescription of each variant protein according to the present inventionis now provided.

Variant protein T08446_PEA_(—)1_P18 (SEQ ID NO:1370) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) T08446_PEA_(—)1_T2 (SEQ IDNO:97). An alignment is given to the known protein (Sorting nexin 26(SEQ ID NO:1442)) at the end of the application. One or more alignmentsto one or more previously published protein sequences are given at theend of the application. A brief description of the relationship of thevariant protein according to the present invention to each such alignedprotein is as follows:

Comparison report between T08446_PEA_(—)1_P18 (SEQ ID NO:1370) andSNXQ_HUMAN (SEQ ID NO:1442):

1. An isolated chimeric polypeptide encoding for T08446_PEA_(—)1_P18(SEQ ID NO:1370), comprising a first amino acid sequence being at least90% homologous toMLSLSLCSHLWGPLILSALQARSTDSLDGPGEGSVQPLPTAGGPSVKGKPGKRLSAPRGPFPRLADCAHFHYENVDFGHIQLLLSPDREGPSLSGENELVFGVQVTCQGRSWPVLRSYDDFRSLDAHLHRCIFDRRFSCLPELPPPPEGARAAQMLVPLLLQYLETLSGLVDSNLNCGPVLTWME corresponding to amino acids1-185 of SNXQ_HUMAN (SEQ ID NO:1442), which also corresponds to aminoacids 1-185 of T08446_PEA_(—)1_P18 (SEQ ID NO:1370), and a second aminoacid sequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequenceLDNHGRRLLLSEEASLNIPAVAAAHVIKRYTAQAPDELSFEVGDIVSVIDMPPTEDRSWWRGKRGFQVGFFPSECVELFTERPGPGLKADADGPPCGIPAPQGISSLTSAVPRPRGKLAGLLRTFMRSRPSRQRLRQRGILRQRVFGCDLGEHLSNSGQDVPQVLRCCSEFIEAHGVVDGIYRLSGVSSNIQRLRHEFDSERIPELSGPAFLQDIHSVSSLCKLYFRELPNPLLTYQLYGKFSEAMSVPGEEERLVRVHDVIQQLPPPHYRTLEYLLRHLARMARHSANTSMHARNLAIVWAPNLLRSMELESVGMGGAAAFREVRVQSVVVEFLLTHVDVLFSDTFTSAGLDPAGRCLLPRPKSLAGSCPSTRLLTLEEAQARTQGRLGTPTEPTTPKAPASPAERRKGERGEKQRKPGGSSWKTFFALGRGPSVPRKKPLPWLGGTRAPPQPSGSRPDTVTLRSAKSEESLSSQASGAGLQRLHRLRRPHSSSDAFPVGPAPAGSCESLSSSSSSESSSSESSSSSSESSAAGLGALSGSPSHRTSAWLDDGDELDFSPPRCLEGLRGLDFDPLTFRCSSPTPGDPAPPASPAPPAPASAFPPRVTPQAISPRGPTSPASPAALDISEPLAVSVPPAVLELLGAGGAPASATPTPALSPGRSLRPHLIPLLLRGAEAPLTDACQQEMCSKLRGAQGPLGPDMESPLPPPPLSLLRPGGAPPPPPKNPARLMALALAERAQQVAEQQSQQECGGTPPASQSPFHRSLSLEVGGEPLGTSGSGPPPNSLAHPGAWVPGPPPYLPRQQSDGSLLRSQRPMGTSRRGLRGPAQVSAQLRAGGGGRDAPEAAAQSPCSVPSQVPTPGFFSPAPRECLPPFLGVPKPGLYPLGPPSFQPSSPAPVWRSSLGPPAPLDRGENLYYEIGASEGSPYSGPTRSWSPFRSMPPDRLNASYGMLGQSPPLHRSPDFLLSYPPAPSCFPPDHLGYSAPQHPARRPTPPEPLYVNLALGPRGPSPASSSSSSPPAHPRSRSDPGPPVPRLPQKQRAPWGPRTPHRVPGPWGPPEPLLLYRAAPPAYGRGGELHRGSLYRNGGQRGEGAGPPPPYPTPSWSLHSEGQTRSYC (SEQ ID NO: 1733)corresponding to amino acids 186-1305 of T08446_PEA_(—)1_P18 (SEQ IDNO:1370), wherein said first amino acid sequence and second amino acidsequence are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of T08446_PEA_(—)1_P18(SEQ ID NO:1370), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequenceLDNHGRRLLLSEEASLNIPAVAAAHVIKRYTAQAPDELSFEVGDIVSVIDMPPTEDRSWWRGKRGFQVGFFPSECVELFTERPGPGLKADADGPPCGIPAPQGISSLTSAVPRPRGKLAGLLRTFMRSRPSRQRLRQRGILRQRVFGCDLGEHLSNSGQDVPQVLRCCSEFIEAHGVVDGIYRLSGVSSNIQRLRHEFDSERIPELSGPAFLQDIHSVSSLCKLYFRELPNPLLTYQLYGKFSEAMSVPGEEERLVRVHDVIQQLPPPHYRTLEYLLRHLARMARHSANTSMHARNLAIVWAPNLLRSMELESVGMGGAAAFREVRVQSVVVEFLLTHVDVLFSDTFTSAGLDPAGRCLLPRPKSLAGSCPSTRLLTLEEAQARTQGRLGTPTEPTTPKAPASPAERRKGERGEKQRKPGGSSWKTFFALGRGPSVPRKKPLPWLGGTRAPPQPSGSRPDTVTLRSAKSEELSSQASGAGLQRLHRLRRPHSSSDAFPVGPAPAGSCESLSSSSSSESSSSESSSSSSESSAAGLGALSGSPSHRTSAWLDDGDELDFSPPRCLEGLRGLDFDPLTFRCSSPTPGDPAPPASPAPPAPASAFPPRVTPQAISPRGPTSPASPAALDISEPLAVSVPPAVLELLGAGGAPASATPTPALSPGRSLRPHLIPLLLRGAEAPLTDACQQEMCSKLRGAQGPLGPDMESPLPPPPLSLLRPGGAPPPPPKNPARLMALALAERAQQVAEQQSQQECGGTPPASQSPFHRSLSLEVGGEPLGTSGSGPPPNSLAHPGAWVPGPPPYLPRQQSDGSLLRSQRPMGTSRRGLRGPAQVSAQLRAGGGGRDAPEAAAQSPCSVPSQVPTPGFFSPAPRECLPPFLGVPKPGLYPLGPPSFQPSSPAPVWRSSLGPPAPLDRGENLYYEIGASEGSPYSGPTRSWSPFRSMPPDRLNASYGMLGQSPPLHRSPDFLLSYPPAPSCFPPDHLGYSAPQHPARRPTPPEPLYVNLALGPRGPSPASSSSSSPPAHPRSRSDPGPPVPRLPQKQRAPWGPRTPHRVPGPWGPPEPLLLYRAAPPAYGRGGELHRGSLYRNGGQRGEGAGPPPPYPTPSWSLHSEGQTRSYC (SEQ ID NO: 1733) inT08446_PEA_(—)1_P18 (SEQ ID NO:1370).

Comparison report between T08446_PEA_(—)1_P18 (SEQ ID NO:1370) andQ9NT23 (SEQ ID NO:1443) (SEQ ID NO: 1443):

1. An isolated chimeric polypeptide encoding for T08446_PEA_(—)1_P18(SEQ ID NO:1370), comprising a first amino acid sequence being at least70%, optionally at least 80%, preferably at least 85%, more preferablyat least 90% and most preferably at least 95% homologous to apolypeptide having the sequenceMLSLSLCSHLWGPLILSALQARSTDSLDGPGEGSVQPLPTAGGPSVKGKPGKRLSAPRGPFPRLADCAHFHYENVDFGHIQLLLSPDREGPSLSGENELVFGVQVTCQGRSWPVLRSYDDFRSLDAHLHRCIFDRRFSCLPELPPPPEGARAAQMLVPLLLQYLETLSGLVDSNLNCGPVLTWMELDNHGRRLLLSEEASLNIPAVAAAHVIKRYTAQAPDELSFEVGDIVSVIDMPPTEDRSWWRGKRGFQVGFFPSECVELFTERPGPGLKADADGPPCGIPAPQGISSLTSAVPRPRGKLAGLLRTFMRSRPSRQRLRQRGILRQRVFGCDLGEHLSNSGQDVPQVLRCCSEFIEAHGVVDGIYRLSGVSSNIQRLRHEFDSERIPELSGPAFLQDIHSVSSLCKLYFRELPNPLLTYQLYGKFSEAMSVPGEEERLVRV (SEQ ID NO: 1734) corresponding to amino acids 1-443 ofT08446_PEA_(—)1_P18 (SEQ ID NO:1370), a second amino acid sequence beingat least 90% homologous toHDVIQQLPPPHYRTLEYLLRHLARMARHSANTSMHARNLAIVWAPNLLRSMELESVGMGGAAAFREVRVQSVVVEFLLTHVDVLFSDTFTSAGLDPAGRCLLPRPKSLAGSCPSTRLLTLEEAQARTQGRLGTPTEPTTPKAPASPAERRKGERGEKQRKPGGSSWKTFFALGRGPSVPRKKPLPWLGGTRAPPQPSGSRPDTVTLRSAKSEESLSSQASGAGLQRLHRLRRPHSSSDAFPVGPAPAGSCESLSSSSSSESSSSSSESSAAGLGALSGSPSHRTSAWLDDGDELDFSPPRCLEGLRGLDFDPLTFRCSSPTPGDPAPPASPAPPAPASAFPPPVTPQAISPRGPTSPASPAALDISEPLAVSVPPAVLELLGAGGAPASATPTPALSPGRSLRPHLIPLLLRGAEAPLTDACQQEMCSKLRGAQGLGPDMESPLPPPPLSLLRPGGAPPPPPKNPARLMALALAERAQQVAEQQSQQECGGTPPASQSPFHRSLSLEVGGEPLGTSGSGPPPNSLAHPGAWVPGPPPYLPRQQSDGSLLRSQRPMGTSRRGLRGPAQVSAWLRAGGGGRDAPEAAAQSPCSVPSQVPTPGFFFSPAPRECLPPFLGVPKPGLYPLGPPSFQPSSPAPVWRSSLGPPAPLDRGENLYYEIGASEGSPYSG corresponding to amino acids 1-674 ofQ9NT23 (SEQ ID NO:1443), which also corresponds to amino acids 444-1117of T08446_PEA_(—)1_P18 (SEQ ID NO:1370), a bridging amino acid Pcorresponding to amino acid 1118 of T08446_PEA_(—)1_P18 (SEQ IDNO:1370), and a third amino acid sequence being at least 90% homologoustoTRSWSPFRSMPPDRLNASYGMLGQSPPLHRSPDFLLSYPPAPSCFPPDHLGYSAPQHPARRPTPPEPLYVNLALGPRGPSPASSSSSSPPAHPRSRSDPGPPVPRLPQKQRAPWGPRTPHRVPGPWGPPEPLLLYRAAPPAYGRGGELHRGSLYRNGGQRGEGAGPPPPYPTPSWSLHSEGQTRSYC corresponding to amino acids676-862 of Q9NT23 (SEQ ID NO:1443), which also corresponds to aminoacids 1119-1305 of T08446_PEA_(—)1_P18 (SEQ ID NO:1370), wherein saidfirst amino acid sequence, second amino acid sequence, bridging aminoacid and third amino acid sequence are contiguous and in a sequentialorder.

2. An isolated polypeptide encoding for a head of T08446_PEA_l_P18 (SEQID NO:1370), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequenceMLSLSLCSHLWGPLILSALQARSTDSLDGPGEGSVQPLPTAGGSVKGKPGKRLSAPRGPFPRLADCAHFHYENVDFGHIQLLLSPDREGPSLSGENELVFGVQVTCQGRSWPVLRSYDDFRSLDAHLHRCIFDRRFSCLPELPPPPEGARAAQMLVPLLLQYLETLSGLVDSNLNCGPVLTWMELDNHGRRLLLSEEASLNIPAVAAAHVIKRYTAQAPDELSFEVGDIVSVIDMPPTEDRSWWRGKRGFQVGFFPSECVELFTERPGPGLKADADGPPCGIPAPQGISSLTSAVPRPRGKLAGLLRTFMRSRPSRQRLRQRGILRQRVFGCDLGEHLSNSGQDVPQVLRCCSEFIEAHGVVDGIYRLSGVSSNIQRLRHEFDSERIPELSGPAFLQDIHSVSSLCKLYFRELPNOLLTYQLYGKFSEAMSVPGEEERLVRV (SEQ ID NO: 1734) of T08446_PEA_(—)1_P18 (SEQ ID NO:1370).

Comparison report between T08446_PEA_(—)1_P18 (SEQ ID NO:1370) andQ96CP3 (SEQ ID NO:1444) (SEQ ID NO: 1444):

1. An isolated chimeric polypeptide encoding for T08446_PEA_l_P18 (SEQID NO:1370), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequenceMLSLSLCHLWGPLILSALQARSTDSLDPGPGEGVQPLPTAGGPSVKGKPGKRLSAPRGPRLADCAHFHYENVDFGHIQLLLSPDREGPSLSGENELVFGVQVTCQGRSWPVLRSYDDFRSLDAHLHRCIFDRRFSCLPELPPPPEGARAAQMLVPLLLQYLETLSGLVDSNLNCGPVLTWMELDNHGRRLLLSEEASLNIPAVAAAHVIKRYTAQAPDELSFEVGDIVSVIDMPPTEDRSWWRGKRGFQVGFFPSECVELFTERPGPGLKADADGPPCGIPAPQGISSLTSAVPRPRGKLAGLLRTFMRSRPSRQRLRQRGILRQRVFGCDLGEHLSNSGQDVPQVLRCCSEFIEAHGVVDGIYRLSGVSSNIQRLRHEFDSERIPELSGPAFLQDIHSVSSLCKLYFRELPNPLLTYQLYGKFSEAMSVPGEEERLVRVHDVIQQLPPPHYRTLEYLLRHLARMARHSANTSMHARNLAIVWAPNLLRSMELESVGMGGAAAFREVRVQSVVVEFLLTHVDVLFSDTFTSAGLDPAGRCLLPRPKSLAGSCPSTRLLTLEEAQARTQGRLGTPTEPTTPKAPASPAERRKGERGEKQRKPGGSSWKTFFALGRGPSVPKKPLPWLGGTRAPPQPSGSRPDTVTLRSAKSEESLSSQASGAGLQRLHRLRRPHSSSDAFPVGPAPAGSCESLSSSSSSESSSSESSSSSSESSAAGLGALSGSPSHRTSAWLDDGDELDFSPPRCLEGLRGLDFDPLTFRCSSPTPGDPAPPASPAPPAPASAFPPRVTPQAISPRGPTSPASPAALDISEPLAVSVPPAVLELLGAGGAPASATPTPALSPGRSLRPHLIPLLLRGAEAPLTDACQQEMCSKLRGAQGPLGPDMESPLPPPPLSLLRPGGAPPPPPKNPARLMALALAERAQQVAEQQSQQECGGTPPASQSPFHRSLSLEVGGEPLGTSGSGPPPNSLAHPGAWVPGPPPYLPRQQSDGSLLRSQRPMGTSRRG corresponding to amino acids 1-1010 of T08446_PEA_l_P18 (SEQ IDNO:1370), and a second amino acid sequence being at least 90% homologoustoLRGPAQVSAQLRAGGGGRDAPEAAAQSPCSVPSQVPTPGFFSPAPRECLPPFLGVPKPGLYPLGPPSFQPSSPAPVWRSSLGPPAPLDRGENLYYEIGASEGSPYSGPTRSWSPFRSMPPDLNASYGMLGQSPPLHRSPDFLLSYPPAPSCFPPDHLGYSAPQHPARRPTPPEPLYVNLALGPRGPSPASSSSSSPPAHPRSRSDPGPPVPRLPQKQRAPWGPRTPHRVPGPWGPPEPLLLYRAAPPAYGRGGELHRGSLYRNGGQRGEGAGPPPPYPTPSWSLHSEGQTRSYC corresponding to amino acids 1-295 of Q96CP3 (SEQ ID NO:1444),which also corresponds to amino acids 1011-1305 of T08446_PEA_(—)1_P18(SEQ ID NO:1370), wherein said first amino acid and second amino acidsequence are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a head of T08446_PEA_(—)1_P18(SEQ ID NO:1370), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequenceMLSLSLCSHLWGPLILSALQARSTDSLDGPGEGSVQPLPTAGGPSVKGKPGKRLSAPRGPFPRLADCAHFHYENVDFGHIQLLLSPDREGPSLSGENELVFGVQVTCQGRSWPVLRSYDDFRSLDAHLHRCIFDRRFSCLPELPPPPEGARAAQMLVPLLLQYLETLSGLVDSNLNCGPVLTWMELDNHGRRLLLSEEASLNIPAVAAAHIKRYTAQAPDELSFEVGDIVSVIDMPPTEDRSWWRGKRGFQVGFFPSECVELFTERPGPGLKADADGPPCGIPAPQGISSLTSAVPRPRGKLAGLLRTFMRSRPSRQRLRQRGILRQRVFGCDLGEHLSNSGQDVPQVLRCCSEFIEAHGVVDGIYRLSGVSSNIQRLRHEFDSERIPELSGPAFLQDIHSVSSLCKLYFRELPNPLLTYQLYGKFSEAMSVPGEEERLVRVHDVIQQLPPPHYRTLEYLLRHLARMARHSANTSMHARNLAIVWAPNLLRSMELESVGMGGAAAFREVRVQSVVVEFLLTHVDVLFSDTFTSAGLDPAGRCLLPRPKSLAGSCPSTRLLTLEEAQARTQGRLGTPTEPTTPKAPASPAERRKGERGEKQRKPGGSSWKTFFALGRGPSVPRKKPLPWLGGTRAPPQPSGSRPDTVTLRSAKSEESLSSQASGAGLQRLHRLRRPHSSSDAFPVGPAPAGSCESLSSSSSSESSSSESSSSSSESSAAGLGALSGSPSHRTSAWLDDGDELDFSPPRCLEGLRGLDFDPLTFRCSSPTPGDPAPASPASPAPPAPASAFPPRVTPQAISPRGPTSPASPAALDISEPLAVSVPPAVLELLGAGGAPASATPTPALSPGRSLRPHLIPLLLRGAEAPLTDACQQEMCSKLRGAQGPLGPDMESPLPPPPLSLLRPGGAPPPPPKNPARLMALALAERAQQVAEQQSQQECGGTPPASQSPFHRSLSLEVGGEPLGTSGSGPPPNSLAHPGAWVPGPPPYLPRQQSDGSLLRSQRPMGTSRRG of T08446_PEA_(—)1_P18 (SEQ ID NO:1370).

Comparison report between T08446_PEA_(—)1_P18 (SEQ ID NO:1370) andBAC86902 (SEQ ID NO: 1445):

1. An isolated chimeric polypeptide encoding for T08446_PEA_(—)1_P18(SEQ ID NO:1370), comprising a first amino acid sequence being at least70%, optionally at least 80%, preferably at least 85%, more preferablyat least 90% and most preferably at least 95% homologous to apolypeptide having the sequenceMLSLSLCSHLWGPLILSALQARSTDSLDGPGEGSVQPLPTAGGPSVKGKPGKRLSAPRGPFPRLADCAHFHYENVDFGHIQLLLSPDREGPSLSGENELVFGVQVTCQGRSWPVLRYDDFRSLDAHLHRCIFDRRFSCLPELPPPPEGARAAQ corresponding to amino acids 1-154 of T08446_PEA_(—)1_P18(SEQ ID NO:1370), a second amino acid sequence being at least 90%homologous toMLVPLLLQYLETLSGLVDSNLNCGPVLTWMELDNHGRRLLLSEEASLNIPAVAAAHVIKRYTAQAPDELSFEVGDIVSVIDMPPTEDRSWWRGKRGFQVGFFPSECVELFTERPGPGLKADADGPPCGIPAPQGISSLTSAVPRPRGKLAGLLRTFMRSRPSRQRLRQRGILRQRVFGCDLGEHLSNSGQDVPQVLRCCSEFIEAHGVVDGIYRLSGVSSNIQRLRHEFDSERIPELSGPAFLQDIHSVSSLCKLYFRELPNPLLTYQLYGKFSEAMSVPGEEERLVRVHDVIQQLPPPHYRTLEYLLRHLARMARHSANTSMHARNLAIVWAPNLLRSMELESVGMGGAAAFREVRVQSVVVEFLLTHVDVLFSDTFTSAGLDPAGRCLLPRPKSLAGSCPSTRLLTLEEAQARTQGRLGTPTEPTTPKAPASPAERRKGERGEKQRKPGGSSWKTFFALGRGPSVPRKKPLPWLGGTRAPPQPSGSRPDTVTLRSAKSEESLSSQASGAGLQRLHRLRRPHSSSDAFPVGPAPAGSCESLSSSSSSESSSSESSSSSSESSAAGLGALSGSPSHRTSAWLDDGDELDFSPPRCLEGLRGLDFDPLTFRCSSPTPGDPAPPASPAPPAPASAFPPRVTPQAISPRGPTSPASPAALDISEPLAVSVPPAVLELLGAGGAPASATPTPALSPGRSLRPHLIPLLLRGAEAPLTDACQQEMCSKLRGAQGPLGPDMESPLPPPPLSLLRPGGAPPPPKNPARLMALALAERAQQVAEQQSQQECGGTPPASQSPFHRSLSLEVGGEPLGTSGSGPPPNSLAHPGAWVPGPPPYLPRQQSDGSLLRSQRPMGTSRRGLRGPA corresponding to amino acids 1-861 of BAC86902 (SEQ ID NO:1445), whichalso corresponds to amino acids 155-1015 of T08446_PEA_(—)1_P18 (SEQ IDNO:1370), a third amino acid sequence being at least 70%, optionally atleast 80%, preferably at least 85%, more preferably at least 90% andmost preferably at least 95% homologous to a polypeptide having thesequence QVSAQLRAGGGGRDAPEAAAQSPCSVPS corresponding to amino acids1016-1043 of T08446_PEA_(—)1_P18 (SEQ ID NO:1370), a fourth amino acidsequence being at least 90% homologous toQVPTPGFFSPAPRECLPPFLGVPKPGLYPLGPPSFQPSSPAPVWRSSLGPPAPLDRGENLYYEIGASEGSPYSGPTRSWSPFRSMPPDRLNASYGMLGQSPPLHRSPDFLLSYPPAPSCFPPDHLGYS corresponding toamino acids 862-989 of BAC86902 (SEQ ID NO:1445), which also correspondsto amino acids 1044-1171 of T08446 PEA_(—)1_P18 _P18 (SEQ ID NO:1370),and a fifth amino acid sequence being at least 70%, optionally at least80%, preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceAPQHPARRPTPPEPLYVNLALGPRGPSPASSSSSSPPAHPRSRSDPGPPVPRLPQKQRAPWGPRTPHRVPGPWGPPEPLLLYRAAPPAYGRGGELHRGSLYRNGGQRGEGAGPPPPYPTPSWSLHSEGQTRSYCcorresponding to amino acids 1172-1305 of T08446_PEA_(—)1_P18 (SEQ IDNO:1370), wherein said first amino acid sequence, second amino acidsequence, third amino acid sequence, fourth amino acid sequence andfifth amino acid sequence are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a head of T08446_PEA_(—)1_P18(SEQ ID NO:1370), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequenceMLSLSLCSHLWGPLILSALQARSTDSLDGPGEGSVQPLPTAGGPSVKGKPGKRLSAPRGPFPRLADCAHFHYENVDFGHIQLLLSPDREGPSLSGENELVFGVQVTCQGRSWPVLRSYDDFRSLDAHLHRCIFDRRFSCLPELPPPPEGARAAQ of T08446_PEA_(—)1_P18 (SEQ ID NO:1370).

3. An isolated polypeptide encoding for an edge portion ofT08446_PEA_(—)1_P18 (SEQ ID NO:1370), comprising an amino acid sequencebeing at least 70%, optionally at least about 80%, preferably at leastabout 85%, more preferably at least about 90% and most preferably atleast about 95% homologous to the sequence encoding forQVSAQLRAGGGGRDAPEAAAQSPCSVPS, corresponding to T08446_PEA_(—)1_P18 (SEQID NO:1370).

4. An isolated polypeptide encoding for a tail of T08446_PEA_(—)1_P18(SEQ ID NO:1370), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequenceAPQHPARRPTPPEPLYVNLALGPRGPSPASSSSSSPPAHPRSRSDPGPPVRRLPQKQRAPWGPRTPHRVPGPWGPPEPLLLYRAAPPAYGRGGELHRGSLYRNGGQRGEGAGPPPPYPTPSWSLHSEGQTRSYS inT08446_PEA_(—)1_P18 (SEQ ID NO:1370).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein T08446_PEA_(—)1_P18 (SEQ ID NO:1370) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 756, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein T08446_PEA_(—)1_P18 (SEQ ID NO:1370) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 756 Amino acid mutations SNP position(s) Alternative Previously onamino acid amino known sequence acid(s) SNP? 714 S -> C Yes 1000 S -> NNo 1273 R -> S No 1274 N -> H No

Variant protein T08446_PEA_(—)1_P18 (SEQ ID NO:1370) is encoded by thefollowing transcript(s): T08446_PEA_(—)1T2 (SEQ ID NO:97), for which thesequence(s) is/are given at the end of the application. The codingportion of transcript T08446_PEA_(—)1_T2 (SEQ ID NO:97) is shown inbold; this coding portion starts at position 228 and ends at position4142. The transcript also has the following SNPs as listed in Table 757(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinT08446_PEA_(—)1_P18 (SEQ ID NO:1370) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 757 Nucleic acid SNPs SNP position Alternative Previously onnucleotide nucleic known sequence acid SNP? 212 G -> A Yes 431 C -> TYes 809 C -> T Yes 1547 G -> A Yes 2368 C -> G Yes 3226 G -> A No 3284 C-> G Yes 3377 C -> T Yes 4046 A -> C No 4047 A -> C No

Variant protein T08446_PEA_(—)1_P19 (SEQ ID NO:1371) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) T08446_PEA_(—)1_T22 (SEQ IDNO:98). The location of the variant protein was determined according toresults from a number of different software programs and analyses,including analyses from SignalP and other specialized programs. Thevariant protein is believed to be located as follows with regard to thecell: secreted. The protein localization is believed to be secretedbecause both signal-peptide prediction programs predict that thisprotein has a signal peptide, and neither trans-membrane regionprediction program predicts that this protein has a trans-membraneregion.

Variant protein T08446_PEA_(—)1_P19 (SEQ ID NO:1371) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 758, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein T08446_PEA_l_P19 (SEQ ID NO:1371) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 758 Amino acid mutations SNP position(s) Alternative Previously onamino acid amino known sequence acid(s) SNP? 194 D -> G Yes

Variant protein T08446_PEA_(—)1_P19 (SEQ ID NO:1371) is encoded by thefollowing transcript(s): T08446_PEA_(—)1 T22 (SEQ ID NO:98), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript T08446_PEA_(—)1_T22 (SEQ ID NO:98) is shown inbold; this coding portion starts at position 228 and ends at position965. The transcript also has the following SNPs as listed in Table 759(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinT08446_PEA_(—)1_P19 (SEQ ID NO:1371) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 759 Nucleic acid SNPs SNP position Alternative Previously onnucleotide nucleic known sequence acid SNP? 212 G -> A Yes 431 C -> TYes 808 A -> G Yes

As noted above, cluster T08446 features 36 segment(s), which were listedin Table 2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster T08446_PEA_(—)1_node_(—)2 (SEQ ID NO:706) according tothe present invention is supported by 1 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEAWT2 (SEQ ID NO:97) andT08446_PEA_(—)1_T22 (SEQ ID NO:98). Table 760 below describes thestarting and ending position of this segment on each transcript.

TABLE 760 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)1 287 T08446_PEA_1_T22 (SEQ ID NO: 98) 1 287

Segment cluster T08446_PEA_(—)1_node_(—)9 (SEQ ID NO:707) according tothe present invention is supported by 17 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97)and T08446_PEA_(—)1_T22 (SEQ ID NO:98). Table 761 below describes thestarting and ending position of this segment on each transcript.

TABLE 761 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)552 689 T08446_PEA_1_T22 (SEQ ID NO: 98) 552 689

Segment cluster T08446_PEA_(—)1_node_(—)15 (SEQ ID NO:708) according tothe present invention is supported by 0 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T22 (SEQ IDNO:98). Table 762 below describes the starting and ending position ofthis segment on each transcript.

TABLE 762 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T22 (SEQ ID NO:98) 829 968

Segment cluster T08446_PEA_(—)1_node_l 7 (SEQ ID NO:709) according tothe present invention is supported by 22 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 763 below describes the starting and ending position of thissegment on each transcript.

TABLE 763 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)783 905

Segment cluster T08446_PEA_(—)1_node_(—)25 (SEQ ID NO:710) according tothe present invention is supported by 24 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 764 below describes the starting and ending position of thissegment on each transcript.

TABLE 764 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)1111 1263

Segment cluster T08446_PEA_(—)1_node_(—)29 (SEQ ID NO:711) according tothe present invention is supported by 25 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 765 below describes the starting and ending position of thissegment on each transcript.

TABLE 765 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)1367 1511

Segment cluster T08446_PEA_(—)1_node_(—)38 (SEQ ID NO:712) according tothe present invention is supported by 20 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 766 below describes the starting and ending position of thissegment on each transcript.

TABLE 766 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)1703 1848

Segment cluster T08446_PEA_(—)1_node_(—)43 (SEQ ID NO:713) according tothe present invention is supported by 15 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 767 below describes the starting and ending position of thissegment on each transcript.

TABLE 767 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)1849 2002

Segment cluster T08446_PEA_(—)1_node_(—)51 (SEQ ID NO:714) according tothe present invention is supported by 19 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 768 below describes the starting and ending position of thissegment on each transcript.

TABLE 768 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)2224 2571

Segment cluster T08446_PEA_(—)1_node_(—)52 (SEQ ID NO:715) according tothe present invention is supported by 15 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 769 below describes the starting and ending position of thissegment on each transcript.

TABLE 769 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)2572 2694

Segment cluster T08446_PEA_(—)1_node_(—)55 (SEQ ID NO:716) according tothe present invention is supported by 21 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 770 below describes the starting and ending position of thissegment on each transcript.

TABLE 770 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)2707 2883

Segment cluster T08446_PEA_(—)1_node_(—)57 (SEQ ID NO:717) according tothe present invention is supported by 37 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 771 below describes the starting and ending position of thissegment on each transcript.

TABLE 771 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)2884 3275

Segment cluster T08446_PEA_(—)1_node_(—)59 (SEQ ID NO:718) according tothe present invention is supported by 36 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 772 below describes the starting and ending position of thissegment on each transcript.

TABLE 772 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)3360 3670

Segment cluster T08446_PEA_(—)1_node_(—)62 (SEQ ID NO:719) according tothe present invention is supported by 36 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 773 below describes the starting and ending position of thissegment on each transcript.

TABLE 773 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)3783 3988

Segment cluster T08446_PEA_(—)1_node_(—)63 (SEQ ID NO:720) according tothe present invention is supported by 64 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 774 below describes the starting and ending position of thissegment on each transcript.

TABLE 774 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)3989 4414

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster T08446_PEA_(—)1_node_(—)3 (SEQ ID NO:721) according tothe present invention is supported by 14 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97)and T08446_PEA_(—)1_T22 (SEQ ID NO:98). Table 775 below describes thestarting and ending position of this segment on each transcript.

TABLE 775 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)288 385 T08446_PEA_1_T22 (SEQ ID NO: 98) 288 385

Segment cluster T08446_PEA_(—)1_node_(—)5 (SEQ ID NO:722) according tothe present invention is supported by 17 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97)and T08446_PEA_(—)1_T22 (SEQ ID NO:98). Table 776 below describes thestarting and ending position of this segment on each transcript.

TABLE 776 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)386 470 T08446_PEA_1_T22 (SEQ ID NO: 98) 386 470

Microarray (chip) data is also available for this segment as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotides were found to hit this segment (in relation to lungcancer), shown in Table 777.

TABLE 777 Oligonucleotides related to this segment Oligonucleotide nameOverexpressed in cancers Chip reference T08446_0_9_0 lung malignanttumors LUN (SEQ ID NO: 234)

Segment cluster T08446_PEA_(—)1_node_(—)7 (SEQ ID NO:723) according tothe present invention is supported by 19 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97)and T08446_PEA_(—)1_T22 (SEQ ID NO:98). Table 778 below describes thestarting and ending position of this segment on each transcript.

TABLE 778 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)471 551 T08446_PEA_1_T22 (SEQ ID NO: 98) 471 551

Microarray (chip) data is also available for this segment as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotides were found to hit this segment (in relation to lungcancer), shown in Table 779.

TABLE 779 Oligonucleotides related to this segment Oligonucleotide nameOverexpressed in cancers Chip reference T08446_0_9_0 lung malignanttumors LUN (SEQ ID NO: 234)

Segment cluster T08446_PEA_(—)1_node_(—)12 (SEQ ID NO:724) according tothe present invention is supported by 14 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97)and T08446_PEA_(—)1_T22 (SEQ ID NO:98). Table 780 below describes thestarting and ending position of this segment on each transcript.

TABLE 780 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)690 782 T08446_PEA_1_T22 (SEQ ID NO: 98) 690 782

Segment cluster T08446_PEA_(—)1_node_(—)13 (SEQ ID NO:725) according tothe present invention is supported by 0 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T22 (SEQ IDNO:98). Table 781 below describes the starting and ending position ofthis segment on each transcript.

TABLE 781 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T22 (SEQ ID NO:98) 783 828

Segment cluster T08446_PEA_(—)1_node_(—)19 (SEQ ID NO:726) according tothe present invention is supported by 19 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 782 below describes the starting and ending position of thissegment on each transcript.

TABLE 782 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)906 983

Segment cluster T08446_PEA_(—)1_node_(—)21 (SEQ ID NO:727) according tothe present invention is supported by 21 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 783 below describes the starting and ending position of thissegment on each transcript.

TABLE 783 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)984 1050

Segment cluster T08446_PEA_(—)1_node_(—)23 (SEQ ID NO:728) according tothe present invention is supported by 22 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 784 below describes the starting and ending position of thissegment on each transcript.

TABLE 784 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)1051 1110

Segment cluster T08446_PEA_(—)1_node_(—)27 (SEQ ID NO:729) according tothe present invention is supported by 23 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 785 below describes the starting and ending position of thissegment on each transcript.

TABLE 785 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)1264 1366

Segment cluster T08446_PEA_(—)1_node_(—)32 (SEQ ID NO:730) according tothe present invention is supported by 23 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 786 below describes the starting and ending position of thissegment on each transcript.

TABLE 786 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)1512 1594

Segment cluster T08446_PEA_(—)1_node_(—)34 (SEQ ID NO:731) according tothe present invention is supported by 22 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 787 below describes the starting and ending position of thissegment on each transcript.

TABLE 787 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)1595 1702

Segment cluster T08446_PEA_(—)1_node_(—)45 (SEQ ID NO:732) according tothe present invention is supported by 19 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 788 below describes the starting and ending position of thissegment on each transcript.

TABLE 788 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)2003 2091

Segment cluster T08446_PEA_(—)1_node_(—)46 (SEQ ID NO:733) according tothe present invention is supported by 18 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 789 below describes the starting and ending position of thissegment on each transcript.

TABLE 789 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)2092 2148

Segment cluster T08446_PEA_(—)1_node_(—)48 (SEQ ID NO:734) according tothe present invention is supported by 19 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 790 below describes the starting and ending position of thissegment on each transcript.

TABLE 790 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)2149 2223

Segment cluster T08446_PEA_(—)1_node_(—)54 (SEQ ID NO:735) according tothe present invention can be found in the following transcript(s):T08446_PEA_(—)1_T2 (SEQ ID NO:97). Table 791 below describes thestarting and ending position of this segment on each transcript.

TABLE 791 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)2695 2706

Segment cluster T08446_PEA_(—)1_node_(—)58 (SEQ ID NO:736) according tothe present invention is supported by 13 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 792 below describes the starting and ending position of thissegment on each transcript.

TABLE 792 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)3276 3359

Segment cluster T08446_PEA_(—)1_node_(—)60 (SEQ ID NO:737) according tothe present invention is supported by 27 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 793 below describes the starting and ending position of thissegment on each transcript.

TABLE 793 Segment location on transcripts Segment Segment startingending Transcript name position position T08446_PEA_1_T2 (SEQ ID NO: 97)3671 3720

Segment cluster T08446_PEA_(—)1_node_(—)61 (SEQ ID NO:738) according tothe present invention is supported by 25 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 794 below describes the starting and ending position of thissegment on each transcript.

TABLE 794 Segment location on transcripts Segment Segment endingTranscript name starting position position T08446_PEA_1_T2 (SEQ ID NO:97) 3721 3782

Segment cluster T08446_YEA_(—)1_node_(—)64 (SEQ ID NO:739) according tothe present invention can be found in the following transcript(s):T08446_PEA_(—)1_T2 (SEQ ID NO:97). Table 795 below describes thestarting and ending position of this segment on each transcript.

TABLE 795 Segment location on transcripts Segment Segment endingTranscript name starting position position T08446_PEA_1_T2 (SEQ ID NO:97) 4415 4420

Segment cluster T08446_PEA_(—)1_node_(—)65 (SEQ ID NO:740) according tothe present invention is supported by 39 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 796 below describes the starting and ending position of thissegment on each transcript.

TABLE 796 Segment location on transcripts Segment Segment endingTranscript name starting position position T08446_PEA_1_T2 (SEQ ID NO:97) 4421 4472

Segment cluster T08446_PEA_(—)1_node_(—)66 (SEQ ID NO:741) according tothe present invention is supported by 29 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T08446_PEA_(—)1_T2 (SEQ ID NO:97).Table 797 below describes the starting and ending position of thissegment on each transcript.

TABLE 797 Segment location on transcripts Segment Segment endingTranscript name starting position position T08446_PEA_1_T2 (SEQ ID NO:97) 4473 4539

Variant protein alignment to the previously known protein:

Sequence name: SNXQ_HUMAN (SEQ ID NO: 1442) Sequence documentation:Alignment of: T08446_PEA_1_P18 (SEQ ID NO: 1370) × SNXQ_HUMAN (SEQ IDNO: 1442) . . . Alignment segment 1/1: Quality: 1835.00 Escore: 0Matching length: 185 Total length: 185 Matching Percent Similarity:100.00 Matching Percent Identity: 100.00 Total Percent Similarity:100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:

Sequence name: Q9NT23 (SEQ ID NO: 1443) Sequence documentation:Alignment of: T08446_PEA_1_P18 (SEQ ID NO: 1370) × Q9NT23 (SEQ ID NO:1443) Alignment segment 1/1: Quality: 8548.00 Escore: 0 Matching length:862 Total length: 862 Matching Percent Similarity: 99.88 MatchingPercent Identity: 99.88 Total Percent Similarity: 99.88 Total PercentIdentity: 99.88 Gaps: 0 Alignment:

Sequence name: Q96CP3 (SEQ ID NO: 1444) Sequence documentation:Alignment of: T08446_PEA_1_P18 (SEQ ID NO: 1370) × Q96CP3 (SEQ ID NO:1444) Alignment segment 1/1: Quality: 3019.00 Escore: 0 Matching length:295 Total length: 295 Matching Percent Similarity: 100.00 MatchingPercent Identity: 100.00 Total Percent Similarity: 100.00 Total PercentIdentity: 100.00 Gaps: 0 Alignment:

Sequence name: BAC86902 (SEQ ID NO: 1445) Sequence documentation:Alignment of: T08446_PEA_1_P18 (SEQ ID NO: 1370) × BAC86902 (SEQ ID NO:1445) Alignment segment 1/1: Quality: 9651.00 Escore: 0 Matching length:991 Total length: 1019 Matching Percent Similarity: 99.90 MatchingPercent Identity: 99.90 Total Percent Similarity: 97.15 Total PercentIdentity: 97.15 Gaps: 1 Alignment:

Description for Cluster HUMCA1XIA

Cluster HUMCA1XIA features 4 transcript(s) and 46 segment(s) ofinterest, the names for which are given in Tables 798 and 799,respectively, the sequences themselves are given at the end of theapplication. The selected protein variants are given in table 800

TABLE 798 Transcripts of interest Transcript Name Sequence ID No.HUMCA1XIA_T16 99 HUMCA1XIA_T17 100 HUMCA1XIA_T19 101 HUMCA1XIA_T20 102

TABLE 799 Segments of interest Segment Name Sequence ID No.HUMCA1XIA_node_0 742 HUMCA1XIA_node_2 743 HUMCA1XIA_node_4 744HUMCA1XIA_node_6 745 HUMCA1XIA_node_8 746 HUMCA1XIA_node_9 747HUMCA1XIA_node_18 748 HUMCA1XIA_node_54 749 HUMCA1XIA_node_55 750HUMCA1XIA_node_92 751 HUMCA1XIA_node_11 752 HUMCA1XIA_node_15 753HUMCA1XIA_node_19 754 HUMCA1XIA_node_21 755 HUMCA1XIA_node_23 756HUMCA1XIA_node_25 757 HUMCA1XIA_node_27 758 HUMCA1XIA_node_29 759HUMCA1XIA_node_31 760 HUMCA1XIA_node_33 761 HUMCA1XIA_node_35 762HUMCA1XIA_node_37 763 HUMCA1XIA_node_39 764 HUMCA1XIA_node_41 765HUMCA1XIA_node_43 766 HUMCA1XIA_node_45 767 HUMCA1XIA_node_47 769HUMCA1XIA_node_49 769 HUMCA1XIA_node_51 770 HUMCA1XIA_node_57 771HUMCA1XIA_node_59 772 HUMCA1XIA_node_62 773 HUMCA1XIA_node_64 774HUMCA1XIA_node_66 775 HUMCA1XIA_node_68 776 HUMCA1XIA_node_70 777HUMCA1XIA_node_72 778 HUMCA1XIA_node_74 779 HUMCA1XIA_node_76 780HUMCA1XIA_node_78 782 HUMCA1XIA_node_81 783 HUMCA1XIA_node_83 784HUMCA1XIA_node_85 785 HUMCA1XIA_node_87 786 HUMCA1XIA_node_89 787HUMCA1XIA_node_91 788

TABLE 800 Proteins of interest Protein Name Sequence ID No.Corresponding Transcript(s) HUMCA1XIA_P14 1372 HUMCA1XIA_T16 (SEQ ID NO:99) HUMCA1XIA_P15 1373 HUMCA1XIA_T17 (SEQ ID NO: 100) HUMCA1XIA_P16 1374HUMCA1XIA_T19 (SEQ ID NO: 101) HUMCA1XIA_P17 1375 HUMCA1XIA_T20 (SEQ IDNO: 102)

These sequences are variants of the known protein Collagen alpha 1(SwissProt accession identifier CA1B_HUMAN), SEQ ID NO: 1446, referredto herein as the previously known protein.

Protein Collagen alpha 1 (SEQ ID NO:1446) is known or believed to havethe following function(s): May play an important role in fibrillogenesisby controlling lateral growth of collagen II fibrils. The sequence forprotein Collagen alpha 1 is given at the end of the application, as“Collagen alpha 1 amino acid sequence”. Known polymorphisms for thissequence are as shown in Table 801.

TABLE 801 Amino acid mutations for Known Protein SNP position(s) onamino acid sequence Comment  625 G -> V (in STL2). /FTId = VAR_013583. 676 G -> R (in STL2; overlapping phenotype with Marshall syndrome)./FTId = VAR_013584. 921-926 Missing (in STL2; overlapping phenotype withMarshall syndrome). /FTId = VAR_013585. 1313-1315 Missing (in STL2;overlapping phenotype with Marshall syndrome). /FTId = VAR_013586. 1516G -> V (in STL2; overlapping phenotype with Marshall syndrome). /FTId =VAR_013587. 941-944 KDGL -> RMGC  986 Y -> H 1074 R -> P 1142 G -> D1218 M -> W 1758 T -> A 1786 S -> N

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: cartilage condensation; vision;hearing; cell-cell adhesion; extracellular matrix organization andbiogenesis, which are annotation(s) related to Biological Process;extracellular matrix structural protein; extracellular matrix protein,adhesive, which are annotation(s) related to Molecular Function; andextracellular matrix; collagen; collagen type XI, which areannotation(s) related to Cellular Component.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

Cluster HUMCA1XIA can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 32 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 32 and Table 802. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions: bonemalignant tumors, epithelial malignant tumors, a mixture of malignanttumors from different tissues and lung malignant tumors.

TABLE 802 Normal tissue distribution Name of Tissue Number adrenal 0bone 207 brain 13 colon 0 epithelial 11 general 11 head and neck 0kidney 0 lung 0 breast 8 pancreas 0 stomach 73 uterus 9

TABLE 803 P values and ratios for expression in cancerous tissue Name ofTissue P1 P2 SP1 R3 SP2 R4 adrenal 4.2e−01 1.9e−01 9.6e−02 3.4 8.2e−023.6 bone 2.4e−01 6.3e−01 7.7e−10 4.3 5.3e−03 1.6 brain 5.0e−01 6.9e−011.8e−01 2.1 4.2e−01 1.3 colon 1.3e−02 2.9e−02 2.4e−01 3.0 3.5e−01 2.4epithelial 3.9e−04 3.2e−03 1.3e−03 2.3 1.8e−02 1.7 general 5.6e−051.6e−03 9.5e−17 4.5 1.1e−09 2.8 head and neck 1.2e−01 2.1e−01 1 1.3 11.1 kidney 6.5e−01 7.2e−01 3.4e−01 2.4 4.9e−01 1.9 lung 5.3e−02 9.1e−025.5e−05 7.3 5.0e−03 4.0 breast 4.3e−01 5.6e−01 6.9e−01 1.4 8.2e−01 1.1pancreas 3.3e−01 1.8e−01 4.2e−01 2.4 1.5e−01 3.7 stomach 5.0e−01 6.1e−016.9e−01 1.0 6.7e−01 0.8 Uterus 7.1e−01 7.0e−01 6.6e−01 1.1 6.4e−01 1.1

As noted above, cluster HUMCA1XIA features 4 transcript(s), which werelisted in Table 798 above. These transcript(s) encode for protein(s)which are variant(s) of protein Collagen alpha 1 (SEQ ID NO:1446). Adescription of each variant protein according to the present inventionis now provided.

Variant protein HUMCA1XIA_P14 (SEQ ID NO:1372) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) HUMCA1XIA_T16 (SEQ IDNO:99). An alignment is given to the known protein (Collagen alpha 1(SEQ ID NO:1446)) at the end of the application. One or more alignmentsto one or more previously published protein sequences are given at theend of the application. A brief description of the relationship of thevariant protein according to the present invention to each such alignedprotein is as follows:

Comparison report between HUMCA1XIA_P14 (SEQ ID NO:1372) andCA1B_HUMAN_V5 (SEQ ID NO: 1447):

1. An isolated chimeric polypeptide encoding for HUMCA 1 XIA_P 14 (SEQID NO:1372), comprising a first amino acid sequence being at least 90%homologous toMEPWSSRWKTKRWLWDFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNSPEGISKTTGFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFSILFTVKPKKGIQSFLLSIYNEHGIQQIGVEVGRSPVFLFEDHTGKPAPEDYPLFRTVNIADGKWHRVAISVEKKTVTMIVDCKKKTTKPLDRSERAIVDTNGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEHYSPDCDSSAPKAAQAQEPQIDEYAPEDIIEYDYEYGEAEYKEAESVTEGPTVTEETIAQTEANIVDDFQEYNYGTMESYQTEAPRHVSGTNEPNPVEEIFTEEYLTGEDYDSQRKNSEDTYLENKEIDGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEEFGPGVPAETDITETSINGHGAYGEKGQKGEPAVVEPGMLVEGPPGPAGPAGIMGPPGLQGPTGPPGDPGDRGPPGRPGLPGADGLPGPPGTMLMLPFRYGGDGSKGPTISAQEAQAQAILQQARIALRGPPGPMGLTGRPGPVGGPGSSGAKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMPGEPGAKGDRGFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEAGPRGLLGPRGTPGAPGQPGMAGDGPPGPKGNMGPQGEPGPPGQQGNPGPQGLPGPQGPIGPPGEKGPQGKPGLAGLPGADGPPGHPGKEGQSGEKGALGPPGPGGPIGYPGPRGVKGADGVRGLKGSKGEKGEDGFPGFKGDMGLKGDREVGQIGPRGEDGPEGPKGRAGTGDPGPSGQAGEKGKLGVPGLPGYPGRQGPKGSTGFPGFPGANGEKGARGVAGKPGPRGQRGPTGPRGSRGARGPTGKPGPKGTSGGDGPPGPPGERGPQGPQGPVGFPGKGPPGPPGKDGLPGHPGQRGETGFQGKTGPPGPGGVVGPQGPTGETGPIGERGHPGPPGPPGEQGLPGAAGKEGAKGDPGPQGISGKDGPAGLRGFPGERGLPGAQGAPGLKGGEGPQGPPGP Vcorresponding to amino acids 1-1056 of CA1B_HUMAN_V5 (SEQ ID NO:1447),which also corresponds to amino acids 1-1056 of HUMCA1XIA_P14 (SEQ IDNO:1372), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence VSMMIINSQTIMVVNYSSSFITLML (SEQ ID NO: 256)corresponding to amino acids 1057-1081 of HUMCA1XIA_P14 (SEQ IDNO:1372), wherein said first amino amino acid sequence and second aminoacid sequence are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of HUMCA1XIA_P14 (SEQ IDNO:1372), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence VSMMIINSQTIMVVNYSSSFITLML (SEQ ID NO: 256) in HUMCA1XIA_P14(SEQ ID NO:1372).

It should be noted that the known protein sequence (CA1B_HUMAN (SEQ IDNO:1446)) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence forCA1B_HUMAN_V5 (SEQ ID NO:1447). These changes were previously known tooccur and are listed in the table below.

TABLE 804 Changes to CA1B_HUMAN_V5 (SEQ ID NO: 1447) SNP position(s) onamino acid sequence Type of change 987 conflict

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein HUMCA1XIA_P14 (SEQ ID NO:1372) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table805, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein HUMCA1XIA_P14 (SEQ ID NO:1372) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 805 Amino acid mutations SNP position(s) on Alternative amino acidsequence amino acid(s) Previously known SNP? 8 W -> G Yes 46 D -> E Yes559 G -> S Yes 832 G -> * Yes 986 H -> Y Yes 1061 I -> M Yes 1070 V -> AYes

Variant protein HUMCA1XIA_P14 (SEQ ID NO:1372) is encoded by thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), for which thesequence(s) is/are given at the end of the application. The codingportion of transcript HUMCA1XIA_T16 (SEQ ID NO:99) is shown in bold;this coding portion starts at position 319 and ends at position 3561.The transcript also has the following SNPs as listed in Table 806 (givenaccording to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinHUMCA1XIA_P14 (SEQ ID NO:1372) sequence provides support for the deducedsequence of this variant protein according to the present invention).

TABLE 806 Nucleic acid SNPs SNP position on Alternative nucleotidesequence nucleic acid Previously known SNP? 157 A -> G No 241 T -> A Yes340 T -> G Yes 456 T -> G Yes 1993 G -> A Yes 2812 G -> T Yes 3274 C ->T Yes 3282 C -> T Yes 3501 A -> G Yes 3527 T -> C Yes

Variant protein HUMCA1XIA_P15 (SEQ ID NO:1373) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) HUMCA1XIA_T17 (SEQ IDNO:100). An alignment is given to the known protein (Collagen alpha 1(SEQ ID NO:1446)) at the end of the application. One or more alignmentsto one or more previously published protein sequences are given at theend of the application. A brief description of the relationship of thevariant protein according to the present invention to each such alignedprotein is as follows:

Comparison report between HUMCA1XIA_P15 (SEQ ID NO:1373) and CA1B_HUMAN(SEQ ID NO:1446):

1. An isolated chimeric polypeptide encoding for HUMCA1XIA_P15 (SEQ IDNO:1373), comprising a first amino acid sequence being at least 90%homologous toMEPWSSRWKTKRWLWDFTVTTLALTFLFQAREVRGAAPVDVLKALKDFHNSPEGISKTTGFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFSILFTVKPKKGIQSFLLSIYNEHGIQQIGVEVGRSPVFLFEDHTGKPAPEDYRLFRTVNIADGKWHRVAISVEKKTVTMIVDCKKKTTKPLDRSERAIVDTNGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEHYSPDCDSSAPKAAQAQEPQIDEYAPEDIIEYDYEYGEAEYKEAESVTEGPTVTEETIAQTEANIVDDFQEYNYGTMESYQTEAPRHVSGTNEPNPVEEIFTEEYLTGEDYDSQRKNSEDTLYENKEIDGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEEFGPGVPAETDITETSINGHGAYGEKCQKGEPAVVEPGMLVEGPPGPAGPAGIMGPPGLQGPTGPPGDPGDRGPPGRPGLPGADGLPGPPGTMLMLPFRYGGDGSKGPTISAQEAQAQAILQQARIALRGPPGPMGLTGRPGPVGGPGSSGAKGESGDPGPQGPRGVQGPPGPTGKPGKRGRPGADGGRGMPGEPGAKGDRGFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEAGPRGLLGPRGTPGAPGQPGMAGVDGPPGPKGNMGPQGEPGPPGQQGNPGPQGLPGPQGPIGPPGEK corresponding to amino acids 1-714 of CA1B_HUMAN (SEQ IDNO:1446), which also corresponds to amino acids 1-714 of HUMCA1XIA_P15(SEQ ID NO:1373), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence MCCNLSFGILIPLQK (SEQ ID NO: 257) corresponding toamino acids 715-729 of HUMCA1XIA_P15 (SEQ ID NO:1373), wherein saidfirst acid sequence and second amino acid sequence are contiguous and ina sequential order.

2. An isolated polypeptide encoding for a tail of HUMCA1XIA_P15 (SEQ IDNO:1373), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence MCCNLSFGILIPLQK (SEQ ID NO: 257) in HUMCA1XIA_P15 (SEQ IDNO:1373).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein HUMCA1XIA_P15 (SEQ ID NO:1373) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table807, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein HUMCA1XIA_P15 (SEQ ID NO:1373) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 807 Amino acid mutations SNP position(s) on Alternative amino acidsequence amino acid(s) Previously known SNP? 8 W -> G Yes 46 D -> E Yes559 G -> S Yes

The glycosylation sites of variant protein HUMCA1XIA_P15 (SEQ IDNO:1373), as compared to the known protein Collagen alpha 1 (SEQ IDNO:1446), are described in Table 808 (given according to theirposition(s) on the amino acid sequence in the first column; the secondcolumn indicates whether the glycosylation site is present in thevariant protein; and the last column indicates whether the position isdifferent on the variant protein).

TABLE 808 Glycosylation site(s) Position(s) on known amino acid sequencePresent in variant protein? 1640 no

Variant protein HUMCA1XIA_P15 (SEQ ID NO:1373) is encoded by thefollowing transcript(s): HUMCA1XIA_T17 (SEQ ID NO:100), for which thesequence(s) is/are given at the end of the application. The codingportion of transcript HUMCA1XIA_T17 (SEQ ID NO:100) is shown in bold;this coding portion starts at position 319 and ends at position 2505.The transcript also has the following SNPs as listed in Table 809 (givenaccording to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinHUMCA1XIA_P15 (SEQ ID NO:1373) sequence provides support for the deducedsequence of this variant protein according to the present invention).

TABLE 809 Nucleic acid SNPs SNP position on Alternative nucleotidesequence nucleic acid Previously known SNP? 157 A -> G No 241 T -> A Yes340 T -> G Yes 456 T -> G Yes 1993 G -> A Yes 2473 C -> T Yes

Variant protein HUMCA1XIA_P16 (SEQ ID NO:1374) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) HUMCA1XIA_T19 (SEQ IDNO:101). An alignment is given to the known protein (Collagen alpha 1(SEQ ID NO:1446)) at the end of the application. One or more alignmentsto one or more previously published protein sequences are given at theend of the application. A brief description of the relationship of thevariant protein according to the present invention to each such alignedprotein is as follows:

Comparison report between HUMCA1XIA_P16 (SEQ ID NO:1374) and CA1B_HUMAN(SEQ ID NO:1446):

1. An isolated chimeric polypeptide encoding for HUMCA1XIA_P16 (SEQ IDNO:1374), comprising a first amino acid sequence being at least 90%homologous toMEPWSSRWKTKRWLWDFTVTTLALTFLFQAREVRGAAPVDVLKALDFHNSPEGISKTTGFCTNRKNSKGSKTAYRVSKQAQLSAPTKQLFPGGTFPEDFSILFTVKPKKGIQSFLLSIYNEHGIQQIGVEVGRSPVFLFEDHTGKPAPEDYPLFRTVNIADGKWHRVAISVEKKTVTMIVDCKKKTTKPLDRSERAIVDTNGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEHYSPDCDSSAPKAAQAQEPQIDEYAPEDIIEYDYEYGEAEYKEAESVTEGPTVTEETIAQTEANIVDDFQEYNYGTMESYQTEAPRHVSGTNEPNPVEIFTEEYLTGEDYDSQRKNSEDTLYENKEIDGRDSDLLVDGDLGEYDFYEYKEYEDKPTSPPNEEFGPGVPAETDITETSINGHGAYGEKGQKGEPAVVEPGMLVEGPPGPAGPAGIMGPPGLQGPTGPPGDPGDRGPPGRPGLPGADGLPGPPGTMLMLPFRYGGDGSKGPTISAQEAQAQAILQQARIALRGPPGPMGLTGRPGPVGGPGSSGAKGESGDPGPQGPRGVQGPPGPTGKPGKRGPRGADGGRGMPGEPGAKGDRGFDGLPGLPGDKGHRGERGPQGPPGPPGDDGMRGEDGEIGPRGLPGEA corresponding to amino acids 1-648 of CA1B_HUMAN (SEQ IDNO:1446), which also corresponds to amino acids 1-648 of HUMCA1XIA_P16(SEQ ID NO:1374), a second amino acid sequence being at least 90%homologous to GMAGVDGPPGPKGNMGPQGEPGPPGQQGNPGPQGLPGPQGPIGPPGEKcorresponding to amino acids 667-714 of CA1B_HUMAN (SEQ ID NO:1446),which also corresponds to amino acids 649-696 of HUMCA1XIA_P16 (SEQ IDNO:1374), and a third amino acid sequence being at least 70% optionallyat least 80%, preferably at least 85%, more preferably at least 90% andmost preferably at least 95% homologous to a polypeptide having thesequence VSFSFSLFYKKVIKFACDKRFVGRHDERKVVKLSLPLYLIYE (SEQ ID NO: 258)corresponding to amino acids 697-738 of HUMCA1XIA_P16 (SEQ ID NO:1374),wherein said first amino acid sequence, second amino acid sequence andthird amino acid sequence are contiguous and in a sequential order.

2. An isolated chimeric polypeptide encoding for an edge portion ofHUMCA1XIA_P16 (SEQ ID NO:1374), comprising a polypeptide having a length“n”, wherein n is at least about 10 amino acids in length, optionally atleast about 20 amino acids in length, preferably at least about 30 aminoacids in length, more preferably at least about 40 amino acids in lengthand most preferably at least about 50 amino acids in length, wherein atleast two amino acids comprise AG, having a structure as follows: asequence starting from any of amino acid numbers 648-x to 648; andending at any of amino acid numbers 649+((n−2)−x), in which x variesfrom 0 to n−2.

3. An isolated polypeptide encoding for a tail of HUMCA1XIA_P16 (SEQ IDNO:1374), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence VSFSFSLFYKKVIKFACDKRFVGRHDERKVVKLSLPLYLIYE (SEQ ID NO: 258) inHUMCA1XIA_P16 (SEQ ID NO:1374).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein HUMCA1XIA_P16 (SEQ ID NO:1374) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table810, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein HUMCA1XIA_P16 (SEQ ID NO:1374) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 810 Amino acid mutations SNP position(s) on Alternative amino acidsequence amino acid(s) Previously known SNP? 8 W -> G Yes 46 D -> E Yes559 G -> S Yes

The glycosylation sites of variant protein HUMCA1XIA_P16 (SEQ IDNO:1374), as compared to the known protein Collagen alpha 1 (SEQ IDNO:1446), are described in Table 811 (given according to theirposition(s) on the amino acid sequence in the first column; the secondcolumn indicates whether the glycosylation site is present in thevariant protein; and the last column indicates whether the position isdifferent on the variant protein).

TABLE 811 Glycosylation site(s) Position(s) on known amino acid sequencePresent in variant protein? 1640 no

Variant protein HUMCA1XIA_P16 (SEQ ID NO:1374) is encoded by thefollowing transcript(s): HUMCA1XIA_T19 (SEQ ID NO:101), for which thesequence(s) is/are given at the end of the application. The codingportion of transcript HUMCA1XIA_T19 (SEQ ID NO:101) is shown in bold;this coding portion starts at position 319 and ends at position 2532.The transcript also has the following SNPs as listed in Table 812 (givenaccording to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinHUMCA1XIA_P16 (SEQ ID NO:1374) sequence provides support for the deducedsequence of this variant protein according to the present invention).

TABLE 812 Nucleic acid SNPs SNP position on Alternative nucleotidesequence nucleic acid Previously known SNP? 157 A -> G No 241 T -> A Yes340 T -> G Yes 456 T -> G Yes 1993 G -> A Yes 2606 C -> A Yes 2677 T ->G Yes 2849 C -> T Yes

Variant protein HUMCA1XIA_P17 (SEQ ID NO:1375) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) HUMCA1XIA_T20 (SEQ IDNO:102). An alignment is given to the known protein (Collagen alpha 1(SEQ ID NO:1446)) at the end of the application. One or more alignmentsto one or more previously published protein sequences are given at theend of the application. A brief description of the relationship of thevariant protein according to the present invention to each such alignedprotein is as follows:

Comparison report between HUMCA1XIA_P17 (SEQ ID NO:1375) and CA1B_HUMAN(SEQ ID NO:1446):

1. An isolated chimeric polypeptide encoding for HUMCA1XIA_P17 (SEQ IDNO:1375), comprising a first amino acid sequence being at least 90%homologous toMEPWSSRWKTKRWLWDFTVTTLALTFLFQAREVRGAAPVDVLKALKFHNSPEGISKTTGFCTNRKNSKGSDTAYRVSKQAQLSAPTKQLFPGGTFPEDFSILFTVKPKKGIQSFLLSIYNEHGIQQIGVEVGRSPVFLFEDHTGKPAPEDYPLFRTVNIADGKWHRVAISVEKKTVTMIVDCKKKTTKPLDRSERAIVDTNGITVFGTRILDEEVFEGDIQQFLITGDPKAAYDYCEHYSPDCDSSAPKAAQAQEPQIDE corresponding to aminoacids 1-260 of CA1B_HUMAN (SEQ ID NO:1446), which also corresponds toamino acids 1-260 of HUMCA1XIA_P17 (SEQ ID NO:1375), and a second aminoacid sequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence VRSTRPEKVFVFQ (SEQ IDNO: 259) corresponding to amino acids 261-273 of HUMCA1XIA_P17 (SEQ IDNO:1375), wherein said first amino acid sequence and second amino acidsequence are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of HUMCA1XIA_P17 (SEQ IDNO:1375), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence VRSTRPEKVFVFQ (SEQ ID NO: 259) in HUMCA1XIA_P17 (SEQ IDNO:1375).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein HUMCA1XIA_P17 (SEQ ID NO:1375) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table813, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein HUMCA1XIA_P17 (SEQ ID NO:1375) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 813 Amino acid mutations SNP position(s) on Alternative amino acidsequence amino acid(s) Previously known SNP? 8 W -> G Yes 46 D -> E Yes

The glycosylation sites of variant protein HUMCA1XIA_P17 (SEQ IDNO:1375), as compared to the known protein Collagen alpha 1 (SEQ IDNO:1446), are described in Table 814 (given according to theirposition(s) on the amino acid sequence in the first column; the secondcolumn indicates whether the glycosylation site is present in thevariant protein; and the last column indicates whether the position isdifferent on the variant protein).

TABLE 814 Glycosylation site(s) Position(s) on known amino acid sequencePresent in variant protein? 1640 no

Variant protein HUMCA1XIA_P17 (SEQ ID NO:1375) is encoded by thefollowing transcript(s): HUMCA1XIA_T20 (SEQ ID NO:102), for which thesequence(s) is/are given at the end of the application. The codingportion of transcript HUMCA1XIA_T20 (SEQ ID NO:102) is shown in bold;this coding portion starts at position 319 and ends at position 1137.The transcript also has the following SNPs as listed in Table 815 (givenaccording to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinHUMCA1XIA_P17 (SEQ ID NO:1375) sequence provides support for the deducedsequence of this variant protein according to the present invention).

TABLE 815 Nucleic acid SNPs SNP position on Alternative nucleotidesequence nucleic acid Previously known SNP? 157 A -> G No 241 T -> A Yes340 T -> G Yes 456 T -> G Yes 1150 A -> C Yes

As noted above, cluster HUMCA1XIA features 46 segment(s), which werelisted in Table 799 above and for which the sequence(s) are given at theend of the application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster HUMCA1XIA_node_(—)0 (SEQ ID NO:742) according to thepresent invention is supported by 13 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), HUMCA1XIA_T17(SEQ ID NO:100), HUMCA1XIA_T19 (SEQ ID NO:101) and HUMCA1XIA_T20 (SEQ IDNO:102). Table 816 below starting and ending position of this segment oneach transcript.

TABLE 816 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T16 (SEQ ID NO: 99)1 424 HUMCA1XIA_T17 (SEQ ID NO: 100) 1 424 HUMCA1XIA_T19 (SEQ ID NO:101) 1 424 HUMCA1XIA_T20 (SEQ ID NO: 102) 1 424

Segment cluster HUMCA1XIA_node_(—)2 (SEQ ID NO:743) according to thepresent invention is supported by 9 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), HUMCA1XIA_T17(SEQ ID NO:100), HUMCA1XIA_T19 (SEQ ID NO:101) and HUMCA1XIA_T20 (SEQ IDNO:102). Table 817 below describes the starting and ending position ofthis segment on each transcript.

TABLE 817 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T16 (SEQ ID NO: 99)425 592 HUMCA1XIA_T17 (SEQ ID NO: 100) 425 592 HUMCA1XIA_T19 (SEQ ID NO:101) 425 592 HUMCA1XIA_T20 (SEQ ID NO: 102) 425 592

Segment cluster HUMCA1XIA_node_(—)4 (SEQ ID NO:744) according to thepresent invention is supported by 5 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), HUMCA1XIA_T17(SEQ ID NO:100), HUMCA1XIA_T19 (SEQ ID NO:101) and HUMCA1XIA_T20 (SEQ IDNO:102). Table 818 below describes the starting and ending position ofthis segment on each transcript.

TABLE 818 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T16 (SEQ ID NO: 99)593 806 HUMCA1XIA_T17 (SEQ ID NO: 100) 593 806 HUMCA1XIA_T19 (SEQ ID NO:101) 593 806 HUMCA1XIA_T20 (SEQ ID NO: 102) 593 806

Microarray (chip) data is also available for this segment as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotides were found to hit this segment (in relation to lungcancer), shown in Table 819.

TABLE 819 Oligonucleotides related to this segment Chip Oligonucleotidename Overexpressed in cancers reference HUMCA1XIA_0_18_0 (SEQ ID lungmalignant tumors LUN NO: 236)

Segment cluster HUMCA1XIA_node_(—)6 (SEQ ID NO:745) according to thepresent invention is supported by 5 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), HUMCA1XIA_T17(SEQ ID NO:100), HUMCA1XIA_T19 (SEQ ID NO:101) and HUMCA1XIA_T20 (SEQ IDNO:102). Table 820 below describes the starting and ending position ofthis segment on each transcript.

TABLE 820 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T16 (SEQ ID NO: 99)807 969 HUMCA1XIA_T17 (SEQ ID NO: 100) 807 969 HUMCA1XIA_T19 (SEQ ID NO:101) 807 969 HUMCA1XIA_T20 (SEQ ID NO: 102) 807 969

Microarray (chip) data is also available for this segment as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotides were found to hit this segment (in relation to lungcancer), shown in Table 821.

TABLE 821 Oligonucleotides related to this segment Chip Oligonucleotidename Overexpressed in cancers reference HUMCA1XIA_0_18_0 (SEQ ID lungmalignant tumors LUN NO: 236)

Segment cluster HUMCA1XIA_node_(—)8 (SEQ ID NO:746) according to thepresent invention is supported by 5 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), HUMCA1XIA_T17(SEQ ID NO:100), HUMCA1XIA_T19 (SEQ ID NO:101) and HUMCA1XIA_T20 (SEQ IDNO:102). Table 822 below describes the starting and ending position ofthis segment on each transcript.

TABLE 822 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T16 (SEQ ID NO: 99)970 1098 HUMCA1XIA_T17 (SEQ ID NO: 100) 970 1098 HUMCA1XIA_T19 (SEQ IDNO: 101) 970 1098 HUMCA1XIA_T20 (SEQ ID NO: 102) 970 1098

Segment cluster HUMCA1XIA_node_(—)9 (SEQ ID NO:747) according to thepresent invention is supported by 2 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T20 (SEQ ID NO:102). Table 823 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 823 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T20 (SEQ ID NO:102) 1099 1271

Segment cluster HUMCA1XIA_node_(—)18 (SEQ ID NO:748) according to thepresent invention is supported by 6 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), HUMCA1XIA_T17(SEQ ID NO:100) and HUMCA1XIA_T19 (SEQ ID NO:101). Table 824 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 824 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T16 (SEQ ID NO: 99)1309 1522 HUMCA1XIA_T17 (SEQ ID NO: 100) 1309 1522 HUMCA1XIA_T19 (SEQ IDNO: 101) 1309 1522

Segment cluster HUMCA1XIA_node_(—)54 (SEQ ID NO:749) according to thepresent invention is supported by 2 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T19 (SEQ ID NO:101). Table 825 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 825 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T19 (SEQ ID NO:101) 2407 2836

Segment cluster HUMCA1XIA_node_(—)55 (SEQ ID NO:750) according to thepresent invention is supported by 4 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T17 (SEQ ID NO:100) and HUMCA1XIA_T19(SEQ ID NO:101). Table 826 below describes the starting and endingposition of this segment on each transcript.

TABLE 826 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T17 (SEQ ID NO:100) 2461 2648 HUMCA1XIA_T19 (SEQ ID NO: 101) 2837 3475

Segment cluster HUMCA1XIA_node_(—)92 (SEQ ID NO:751) according to thepresent invention is supported by 2 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99). Table 827 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 827 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T16 (SEQ ID NO: 99)3487 3615

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster HUMCA1XIA_node_(—)11 (SEQ ID NO:752) according to thepresent invention is supported by 3 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), HUMCA1XIA_T17(SEQ ID NO:100) and HUMCA1XIA_T19 (SEQ ID NO:101). Table 828 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 828 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T16 (SEQ ID NO: 99)1099 1215 HUMCA1XIA_T17 (SEQ ID NO: 100) 1099 1215 HUMCA1XIA_T19 (SEQ IDNO: 101) 1099 1215

Segment cluster HUMCA1XIA_node_(—)15 (SEQ ID NO:753) according to thepresent invention is supported by 5 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), HUMCA1XIA_T17(SEQ ID NO:100) and HUMCA1XIA_T19 (SEQ ID NO:101). Table 829 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 829 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T16 (SEQ ID NO: 99)1216 1308 HUMCA1XIA_T17 (SEQ ID NO: 100) 1216 1308 HUMCA1XIA_T19 (SEQ IDNO: 101) 1216 1308

Segment cluster HUMCA1XIA_node_(—)19 (SEQ ID NO:754) according to thepresent invention is supported by 3 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), HUMCA1XIA_T17(SEQ ID NO:100) and HUMCA1XIA_T19 (SEQ ID NO:101). Table 830 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 830 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T16 (SEQ ID NO: 99)1523 1563 HUMCA1XIA_T17 (SEQ ID NO: 100) 1523 1563 HUMCA1XIA_T19 (SEQ IDNO: 101) 1523 1563

Segment cluster HUMCA1XIA_node_(—)21 (SEQ ID NO:755) according to thepresent invention is supported by 2 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), HUMCA1XIA_T17(SEQ ID NO:100) and HUMCA1XIA_T19 (SEQ ID NO:101). Table 831 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 831 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T16 (SEQ ID NO: 99)1564 1626 HUMCA1XIA_T17 (SEQ ID NO: 100) 1564 1626 HUMCA1XIA_T19 (SEQ IDNO: 101) 1564 1626

Segment cluster HUMCA1XIA_node_(—)23 (SEQ ID NO:756) according to thepresent invention is supported by 3 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), HUMCA1XIA_T17(SEQ ID NO:100) and HUMCA1XIA_T19 (SEQ ID NO:101). Table 832 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 832 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T16 (SEQ ID NO: 99)1627 1668 HUMCA1XIA_T17 (SEQ ID NO: 100) 1627 1668 HUMCA1XIA_T19 (SEQ IDNO: 101) 1627 1668

Segment cluster HUMCA1XIA_node_(—)25 (SEQ ID NO:757) according to thepresent invention is supported by 3 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), HUMCA1XIA_T17(SEQ ID NO:100) and HUMCA1XIA_T19 (SEQ ID NO:101). Table 833 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 833 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T16 (SEQ ID NO: 99)1669 1731 HUMCA1XIA_T17 (SEQ ID NO: 100) 1669 1731 HUMCA1XIA_T19 (SEQ IDNO: 101) 1669 1731

Segment cluster HUMCA1XIA_node_(—)27 (SEQ ID NO:758) according to thepresent invention is supported by 2 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), HUMCA1XIA_T17(SEQ ID NO:100) and HUMCA1XIA_T19 (SEQ ID NO:101). Table 834 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 834 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T16 (SEQ ID NO: 99)1732 1806 HUMCA1XIA_T17 (SEQ ID NO: 100) 1732 1806 HUMCA1XIA_T19 (SEQ IDNO: 101) 1732 1806

Segment cluster HUMCA1XIA_node_(—)29 (SEQ ID NO:759) according to thepresent invention is supported by 3 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), HUMCA1XIA_T17(SEQ ID NO:100) and HUMCA1XIA_T19 (SEQ ID NO:101). Table 835 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 835 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T16 (SEQ ID NO: 99)1807 1890 HUMCA1XIA_T17 (SEQ ID NO: 100) 1807 1890 HUMCA1XIA_T19 (SEQ IDNO: 101) 1807 1890

Segment cluster HUMCA1XIA_node_(—)31 (SEQ ID NO:760) according to thepresent invention is supported by 3 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), HUMCA1XIA_T17(SEQ ID NO:100) and HUMCA1XIA_T19 (SEQ ID NO:101). Table 836 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 836 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T16 (SEQ ID NO: 99)1891 1947 HUMCA1XIA_T17 (SEQ ID NO: 100) 1891 1947 HUMCA1XIA_T19 (SEQ IDNO: 101) 1891 1947

Segment cluster HUMCA1XIA_node_(—)33 (SEQ ID NO:761) according to thepresent invention is supported by 3 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), HUMCA1XIA_T17(SEQ ID NO:100) and HUMCA1XIA_T19 (SEQ ID NO:101). Table 837 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 837 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T16 (SEQ ID NO: 99)1948 2001 HUMCA1XIA_T17 (SEQ ID NO: 100) 1948 2001 HUMCA1XIA_T19 (SEQ IDNO: 101) 1948 2001

Segment cluster HUMCA1XIA_node_(—)35 (SEQ ID NO:762) according to thepresent invention is supported by 4 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), HUMCA1XIA_T17(SEQ ID NO:100) and HUMCA1XIA_T19 (SEQ ID NO:101). Table 838 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 838 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T16 (SEQ ID NO: 99)2002 2055 HUMCA1XIA_T17 (SEQ ID NO: 100) 2002 2055 HUMCA1XIA_T19 (SEQ IDNO: 101) 2002 2055

Segment cluster HUMCA1XIA_node_(—)37 (SEQ ID NO:763) according to thepresent invention is supported by 4 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), HUMCA1XIA T17(SEQ ID NO:100) and HUMCA1XIA_T19 (SEQ ID NO:101). Table 839 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 839 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T16 (SEQ ID NO: 99)2056 2109 HUMCA1XIA_T17 (SEQ ID NO: 100) 2056 2109 HUMCA1XIA_T19 (SEQ IDNO: 101) 2056 2109

Segment cluster HUMCA1XIA_node_(—)39 (SEQ 11) NO:764) according to thepresent invention is supported by 5 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), HUMCA1XIA_T17(SEQ ID NO:100) and HUMCA1XIA_T19 (SEQ ID NO:101). Table 840 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 840 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T16 (SEQ ID NO: 99)2110 2163 HUMCA1XIA_T17 (SEQ ID NO: 100) 2110 2163 HUMCA1XIA_T19 (SEQ IDNO: 101) 2110 2163

Segment cluster HUMCA1XIA_node_(—)41 (SEQ ID NO:765) according to thepresent invention is supported by 4 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), HUMCA1XIA_T17(SEQ ID NO:100) and HUMCA1XIA_T19 (SEQ ID NO:101). Table 841 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 841 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T16 (SEQ ID NO: 99)2164 2217 HUMCA1XIA_T17 (SEQ ID NO: 100) 2164 2217 HUMCA1XIA_T19 (SEQ IDNO: 101) 2164 2217

Segment cluster HUMCA1XIA_node_(—)43 (SEQ ID NO:766) according to thepresent invention is supported by 5 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), HUMCA1XIA_T17(SEQ ID NO:100) and HUMCA1XIA_T19 (SEQ ID NO:101). Table 842 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 842 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T16 (SEQ ID NO: 99)2218 2262 HUMCA1XIA_T17 (SEQ ID NO: 100) 2218 2262 HUMCA1XIA_T19 (SEQ IDNO: 101) 2218 2262

Segment cluster HUMCA1XIA_node_(—)45 (SEQ ID NO:767) according to thepresent invention is supported by 4 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99) and HUMCA1XIA_T17(SEQ ID NO:100). Table 843 below describes the starting and endingposition of this segment on each transcript.

TABLE 843 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T16 (SEQ ID NO: 99)2263 2316 HUMCA1XIA_T17 (SEQ ID NO: 100) 2263 2316

Segment cluster HUMCA1XIA_node_(—)47 (SEQ ID NO:768) according to thepresent invention is supported by 5 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), HUMCA1XIA_T17(SEQ ID NO:100) and HUMCA1XIA_T19 (SEQ ID NO:101). Table 844 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 844 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMCA1XIA_T16 (SEQ ID NO: 99)2317 2361 HUMCA1XIA_T17 (SEQ ID NO: 100) 2317 2361 HUMCA1XIA_T19 (SEQ IDNO: 101) 2263 2307

Segment cluster HUMCA1XIA_node_(—)49 (SEQ ID NO:769) according to thepresent invention is supported by 5 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), HUMCA1XIA_T17(SEQ ID NO:100) and HUMCA1XIA_T19 (SEQ ID NO:101). Table 845 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 845 Segment location on transcripts Segment starting SegmentTranscript name position ending position HUMCA1XIA_T16 (SEQ ID NO: 99)2362 2415 HUMCA1XIA_T17 (SEQ ID NO: 100) 2362 2415 HUMCA1XIA_T19 (SEQ IDNO: 101) 2308 2361

Segment cluster HUMCA1XIA_node_(—)51 (SEQ ID NO:770) according to thepresent invention is supported by 7 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99), HUMCA1XIA_T17(SEQ ID NO:100) and HUMCA1XIA_T19 (SEQ ID NO:101). Table 846 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 846 Segment location on transcripts Segment starting SegmentTranscript name position ending position HUMCA1XIA_T16 (SEQ ID NO: 99)2416 2460 HUMCA1XIA_T17 (SEQ ID NO: 100) 2416 2460 HUMCA1XIA_T19 (SEQ IDNO: 101) 2362 2406

Segment cluster HUMCA1XIA_node_(—)57 (SEQ ID NO:771) according to thepresent invention is supported by 4 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99). Table 847 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 847 Segment location on transcripts Segment starting SegmentTranscript name position ending position HUMCA1XIA_T16 (SEQ ID NO: 99)2461 2514

Segment cluster HUMCA1XIA_node_(—)59 (SEQ ID NO:772) according to thepresent invention is supported by 3 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99). Table 848 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 848 Segment location on transcripts Segment starting SegmentTranscript name position ending position HUMCA1XIA_T16 (SEQ ID NO: 99)2515 2559

Segment cluster HUMCA1XIA_node_(—)62 (SEQ ID NO:773) according to thepresent invention is supported by 3 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99). Table 849 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 849 Segment location on transcripts Segment starting SegmentTranscript name position ending position HUMCA1XIA_T16 (SEQ ID NO: 99)2560 2613

Segment cluster HUMCA1XIA_node_(—)64 (SEQ ID NO:774) according to thepresent invention is supported by 4 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99). Table 850 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 850 Segment location on transcripts Segment starting SegmentTranscript name position ending position HUMCA1XIA_T16 (SEQ ID NO: 99)2614 2658

Segment cluster HUMCA1XIA_node_(—)66 (SEQ ID NO:775) according to thepresent invention is supported by 4 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99). Table 851 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 851 Segment location on transcripts Segment starting SegmentTranscript name position ending position HUMCA1XIA_T16 (SEQ ID NO: 99)2659 2712

Segment cluster HUMCA1XIA_node_(—)68 (SEQ ID NO:776) according to thepresent invention is supported by 7 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99). Table 852 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 852 Segment location on transcripts Segment starting SegmentTranscript name position ending position HUMCA1XIA_T16 (SEQ ID NO: 99)2713 2820

Segment cluster HUMCA1XIA_node_(—)70 (SEQ ID NO:777) according to thepresent invention is supported by 6 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99). Table 853 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 853 Segment location on transcripts Segment starting SegmentTranscript name position ending position HUMCA1XIA_T16 (SEQ ID NO: 99)2821 2874

Segment cluster HUMCA1XIA_node_(—)72 (SEQ ID NO:778) according to thepresent invention is supported by 6 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99). Table 854 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 854 Segment location on transcripts Segment starting SegmentTranscript name position ending position HUMCA1XIA_T16 (SEQ ID NO: 99)2875 2928

Segment cluster HUMCA1XIA_node_(—)74 (SEQ ID NO:779) according to thepresent invention is supported by 5 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99). Table 855 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 855 Segment location on transcripts Segment starting SegmentTranscript name position ending position HUMCA1XIA_T16 (SEQ ID NO: 99)2929 2973

Segment cluster HUMCA1XIA_node_(—)76 (SEQ ID NO:780) according to thepresent invention is supported by 6 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99). Table 856 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 856 Segment location on transcripts Segment starting SegmentTranscript name position ending position HUMCA1XIA_T16 (SEQ ID NO: 99)2974 3027

Segment cluster HUMCA1XIA_node_(—)78 (SEQ ID NO:782) according to thepresent invention is supported by 6 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99). Table 857 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 857 Segment location on transcripts Segment starting SegmentTranscript name position ending position HUMCA1XIA_T16 (SEQ ID NO: 99)3028 3072

Segment cluster HUMCA1XIA_node_(—)81 (SEQ ID NO:783) according to thepresent invention is supported by 8 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99). Table 858 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 858 Segment location on transcripts Segment starting SegmentTranscript name position ending position HUMCA1XIA_T16 (SEQ ID NO: 99)3073 3126

Segment cluster HUMCA1XIA_node_(—)83 (SEQ ID NO:784) according to thepresent invention is supported by 7 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99). Table 859 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 859 Segment location on transcripts Segment starting SegmentTranscript name position ending position HUMCA1XIA_T16 (SEQ ID NO: 99)3127 3180

Segment cluster HUMCA1XIA_node_(—)85 (SEQ ID NO:785) according to thepresent invention is supported by 6 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99). Table 860 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 860 Segment location on transcripts Segment starting SegmentTranscript name position ending position HUMCA1XIA_T16 (SEQ ID NO: 99)3181 3234

Segment cluster HUMCA1XIA_node_(—)87 (SEQ ID NO:786) according to thepresent invention is supported by 10 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99). Table 861 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 861 Segment location on transcripts Segment starting SegmentTranscript name position ending position HUMCA1XIA_T16 (SEQ ID NO: 99)3235 3342

Segment cluster HUMCA1XIA_node_(—)89 (SEQ ID NO:787) according to thepresent invention is supported by 9 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99). Table 862 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 862 Segment location on transcripts Segment starting SegmentTranscript name position ending position HUMCA1XIA_T16 (SEQ ID NO: 99)3343 3432

Segment cluster HUMCA1XIA_node_(—)91 (SEQ ID NO:788) according to thepresent invention is supported by 11 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCA1XIA_T16 (SEQ ID NO:99). Table 863 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 863 Segment location on transcripts Segment starting SegmentTranscript name position ending position HUMCA1XIA_T16 (SEQ ID NO: 99)3433 3486Variant protein alignment to the previously known protein:

Sequence name: CA1B_HUMAN_V5 (SEQ ID NO: 1447) Sequence documentation:Alignment of: HUMCA1XIA_P14 (SEQ ID NO: 1372) × CA1B_HUMAN_V5 (SEQ IDNO: 1447) . . . Alignment segment 1/1: Quality: 10456.00 Escore: 0Matching length: 1058 Total length: 1058 Matching Percent Similarity:99.91 Matching Percent Identity: 99.91 Total Percent Similarity: 99.91Total Percent Identity: 99.91 Gaps: 0 Alignment:

Sequence name: CA1B_HUMAN (SEQ ID NO: 1446) Sequence documentation:Alignment of: HUMCA1XIA_P15 (SEQ ID NO: 1373) × CA1B_HUMAN (SEQ ID NO:1446) Alignment segment 1/1: Quality: 7073.00 Escore: 0 Matching length:714 Total length: 714 Matching Percent Similarity: 100.00 MatchingPercent Identity: 100.00 Total Percent Similarity: 100.00 Total PercentIdentity: 100.00 Gaps: 0 Alignment:

Sequence name: CA1B_HUMAN (SEQ ID NO: 1446) Sequence documentation:Alignment of: HUMCA1XIA_P16 (SEQ ID NO: 1374) × CA1B_HUMAN (SEQ ID NO:1446) Alignment segment 1/1: Quality: 6795.00 Escore: 0 Matching length:696 Total length: 714 Matching Percent Similarity: 100.00 MatchingPercent Identity: 100.00 Total Percent Similarity: 97.48 Total PercentIdentity: 97.48 Gaps: 1 Alignment:

Sequence name: CA1B_HUMAN (SEQ ID NO: 1446) Sequence documentation:Alignment of: HUMCA1XIA_P17 (SEQ ID NO: 1375) × CA1B_HUMAN (SEQ ID NO:1446) Alignment segment 1/1: Quality: 2561.00 Escore: 0 Matching length:260 Total length: 260 Matching Percent Similarity: 100.00 MatchingPercent Identity: 100.00 Total Percent Similarity: 100.00 Total PercentIdentity: 100.00 Gaps: 0 Alignment:

Expression of Homo sapiens Collagen, Type XI, Alpha 1 (COL11A1)HUMCA1X1A Transcripts which are Detectable by Amplicon as Depicted inSequence Name HUMCA1X1A Seg55 (SEQ ID NO:1663) in Normal and CancerousLung Tissues

Expression of Homo sapiens collagen, type XI, alpha 1 (COL11A1)transcripts detectable by or according to seg55, HUMCA1X1A seg55amplicon (SEQ ID NO:1663) and primers HUMCA1X1A seg55F (SEQ ID NO:1661)and HUMCA1X1A seg55R (SEQ ID NO:1662) was measured by real time PCR. Inparallel the expression of four housekeeping genes PBGD (GenBankAccession No. BC019323 (SEQ ID NO:1713); amplicon—PBGD-amplicon, SEQ IDNO:334), HPRT1 (GenBank Accession No. NM_(—)000194 (SEQ ID NO:1714);amplicon—HPRT1-amplicon, SEQ ID NO:1297), Ubiquitin (GenBank AccessionNo. BC000449 (SEQ ID NO:1711); amplicon—Ubiquitin-amplicon, SEQ IDNO:328) and SDHA (GenBank Accession No. NM_(—)004168 (SEQ ID NO:1712);amplicon—SDHA-amplicon, SEQ ID NO:331), was measured similarly. For eachRT sample, the expression of the above amplicon was normalized to thegeometric mean of the quantities of the housekeeping genes. Thenormalized quantity of each RT sample was then divided by the median ofthe quantities of the normal post-mortem (PM) samples (Sample Nos.47-50, 90-93, 96-99, Table 2, above), to obtain a value of foldup-regulation for each sample relative to median of the normal PMsamples.

FIG. 67 is a histogram showing over expression of the above-indicatedHomo sapiens collagen, type XI, alpha 1 (COL11A1) transcripts incancerous lung samples relative to the normal samples. Values representthe average of duplicate experiments. Error bars indicate the minimaland maximal values obtained.

As is evident from FIG. 67, the expression of Homo sapiens collagen,type XI, alpha 1 (COL11A1) transcripts detectable by the aboveamplicon(s) in cancer samples was significantly higher than in thenon-cancerous samples (Sample Nos. 47-50, 90-93, 96-99 Table 2). Notablyan over-expression of at least 5 fold found in 11 out of 15adenocarcinoma samples, 11 out of 16 squamous cell carcinoma samples,and in 2 out of 4 large cell carcinoma samples.

Primer pairs are also optionally and preferably encompassed within thepresent invention; for example, for the above experiment, the followingprimer pair was used as a non-limiting illustrative example only of asuitable primer pair: HUMCA1X1A seg55F forward primer (SEQ ID NO:1661);and HUMCA1X1A seg55R reverse primer (SEQ ID NO:16623).

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: HUMCA1X1A seg55 (SEQID NO:1663).

Forward primer-HUMCA1X1A seg55F (SEQ ID NO: 1661):TTCTCATAGTATTCCATTGATTGGGTA Reverse primer-HUMCA1X1A seg55R (SEQ ID NO:1662): CACCGGTATGGAGAATAGCGA Amplicon (SEQ ID NO: 1663):TTCTCATAGTATTCCATTGATTGGGTATACCAGGTTCTGTTTACTTTTACTTGGCAGTTGATAGAATAGGTGTAGTTTATACTTTTTCGCTATTCTCCAT ACCGGTG

Description for Cluster T11628

Cluster T11628 features 6 transcript(s) and 25 segment(s) of interest,the names for which are given in Tables 864 and 865, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 866.

TABLE 864 Transcripts of interest Transcript Name Sequence ID No.T11628_PEA_1_T3 103 T11628_PEA_1_T4 104 T11628_PEA_1_T5 105T11628_PEA_1_T7 106 T11628_PEA_1_T9 107 T11628_PEA_1_T11 108

TABLE 865 Segments of interest Segment Name Sequence ID No.T11628_PEA_1_node_7 789 T11628_PEA_1_node_11 790 T11628_PEA_1_node_16791 T11628_PEA_1_node_22 792 T11628_PEA_1_node_25 793T11628_PEA_1_node_31 794 T11628_PEA_1_node_37 795 T11628_PEA_1_node_0796 T11628_PEA_1_node_4 797 T11628_PEA_1_node_9 798 T11628_PEA_1_node_13799 T11628_PEA_1_node_14 800 T11628_PEA_1_node_17 801T11628_PEA_1_node_18 802 T11628_PEA_1_node_19 803 T11628_PEA_1_node_24804 T11628_PEA_1_node_27 805 T11628_PEA_1_node_28 806T11628_PEA_1_node_29 807 T11628_PEA_1_node_30 808 T11628_PEA_1_node_32809 T11628_PEA_1_node_33 810 T11628_PEA_1_node_34 811T11628_PEA_1_node_35 812 T11628_PEA_1_node_36 813

TABLE 866 Proteins of interest Protein Name Sequence ID No.Corresponding Transcript(s) T11628_PEA_1_P2 1376 T11628_PEA_1_T9 (SEQ IDNO: 103); T11628_PEA_1_T5 (SEQ ID NO: 105); T11628_PEA_1_T7 (SEQ ID NO:106) T11628_PEA_1_P5 1377 T11628_PEA_1_T9 (SEQ ID NO: 107)T11628_PEA_1_P7 1378 T11628_PEA_1_T11 (SEQ ID NO: 108) T11628_PEA_1_P101379 T11628_PEA_1_T4 (SEQ ID NO: 104)

These sequences are variants of the known protein Myoglobin (SwissProtaccession identifier MYG_HUMAN), SEQ ID NO: 1448, referred to herein asthe previously known protein.

Protein Myoglobin (SEQ ID NO:1448) is known or believed to have thefollowing function(s): Serves as a reserve supply of oxygen andfacilitates the movement of oxygen within muscles. The sequence forprotein Myoglobin is given at the end of the application, as “Myoglobinamino acid sequence”. Known polymorphisms for this sequence are as shownin Table 867.

TABLE 867 Amino acid mutations for Known Protein SNP position(s) onamino acid sequence Comment 54 E -> K. /FTId = VAR_003180. 133 K -> N./FTId = VAR_003181. 139 R -> Q. /FTId = VAR_003182. 139 R -> W. /FTId =VAR_003183. 128 Q -> E

As noted above, cluster T11628 features 6 transcript(s), which werelisted in Table 864 above. These transcript(s) encode for protein(s)which are variant(s) of protein Myoglobin (SEQ ID NO:1448). Adescription of each variant protein according to the present inventionis now provided.

Variant protein T11628_PEA_(—)1_P2 (SEQ ID NO:1376) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) T11628_PEA_(—)1_T3 (SEQ IDNO:103). An alignment is given to the known protein (Myoglobin (SEQ IDNO:1448)) at the end of the application. One or more alignments to oneor more previously published protein sequences are given at the end ofthe application. A brief description of the relationship of the variantprotein according to the present invention to each such aligned proteinis as follows:

Comparison report between T11628_PEA_(—)1_P2 (SEQ ID NO:1376) and Q8WVH6(SEQ ID NO:1450):

1. An isolated chimeric polypeptide encoding for T11628_PEA_(—)1_P2 (SEQID NO:1376), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequenceMGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDE (SEQ ID NO:1735) corresponding to amino acids 1-55 of T11628_PEA_(—)1_P2 (SEQ IDNO:1376), and a second amino acid sequence being at least 90% homologoustoMKASEDLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKHPGDFGADAQGAMNKALELFRKDMASNYKELGFQG corresponding to amino acids 1-99 of Q8WVH6(SEQ ID NO:1450), which also corresponds to amino acids 56-154 ofT11628_PEA_(—)1_P2 (SEQ ID NO:1376), wherein said first amino acidsequence and second amino acid sequence are contiguous and in asequential order.

2. An isolated polypeptide encoding for a head of T11628_PEA_(—)1_P2(SEQ ID NO:1376), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequenceMGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFKFKHLKSEDE (SEQ ID NO: 1735)of T11628_PEA_(—)1_P2 (SEQ ID NO:1376).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:intracellularly. The protein localization is believed to beintracellularly because neither of the trans-membrane region predictionprograms predicted a trans-membrane region for this protein. In additionboth signal-peptide prediction programs predict that this protein is anon-secreted protein.

Variant protein T11628_PEA_(—)1_P2 (SEQ ID NO:1376) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 868, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein T11628_PEA_(—)1_P2 (SEQ ID NO:1376) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 868 Amino acid mutations SNP position(s) on amino Alternativeamino acid sequence acid(s) Previously known SNP? 26 G -> No 44 F -> No92 Q -> R No 135 A -> No 141 K -> No 153 Q -> No

Variant protein T11628_PEA_(—)1_P2 (SEQ ID NO:1376) is encoded by thefollowing transcript(s): T11628 _PEA_(—)1_T3 (SEQ ID NO:103), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript T11628_PEA_(—)1_T3 (SEQ ID NO:103) is shown inbold; this coding portion starts at position 220 and ends at position681. The transcript also has the following SNPs as listed in Table 869(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinT11628_PEA_(—)1_P2 (SEQ ID NO:1376) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 869 Nucleic acid SNPs SNP position on nucleotide sequenceAlternative nucleic acid Previously known SNP? 83 G -> A Yes 93 G -> AYes 95 G -> A Yes 146 G -> A Yes 295 G -> No 349 T -> No 393 G -> A Yes423 C -> T Yes 494 A -> G No 498 G -> A No 623 C -> No 642 G -> No 678 G-> No 686 C -> No 686 C -> A No 717 C -> No 787 T -> G No 820 G -> T No826 G -> T No 850 C -> No 934 T -> G No 975 A -> G Yes 1117 G -> No 1218A -> G No

Variant protein T11628_PEA_(—)1_P5 (SEQ ID NO:1377) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) T11628_PEA_(—)1_T9 (SEQ IDNO:107). An alignment is given to the known protein (Myoglobin (SEQ IDNO:1448)) at the end of the application. One or more alignments to oneor more previously published protein sequences are given at the end ofthe application. A brief description of the relationship of the variantprotein according to the present invention to each such aligned proteinis as follows:

Comparison report between T11628_PEA_(—)1_P5 (SEQ ID NO:1377) andMYG_HUMAN_V1 (SEQ ID NO:1449):

1. An isolated chimeric polypeptide encoding for T11628_PEA_(—)1_P5 (SEQID NO:1377), comprising a first amino acid sequence being at least 90%homologous toMKASEDLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKHPGDFGADAQGAMNKALELFRKDMASNYKELGFQG corresponding to amino acids 56-154 ofMYG_HUMAN_V1 (SEQ ID NO:1449), which also corresponds to amino acids1-99 of T11628_PEA_(—)1_P5 (SEQ ID NO:1377).

It should be noted that the known protein sequence (MYG_HUMAN (SEQ IDNO:1448)) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence forMYG_HUMAN_V1 (SEQ ID NO:1449). These changes were previously known tooccur and are listed in the table below.

TABLE 870 Changes to MYG_HUMAN_V1 (SEQ ID NO: 1449) SNP position(s) onamino acid sequence Type of change 1 init_met

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:intracellularly. The protein localization is believed to beintracellularly because neither of the trans-membrane region predictionprograms predicted a trans-membrane region for this protein. In additionboth signal-peptide prediction programs predict that this protein is anon-secreted protein.

Variant protein T11628_PEA_(—)1_P5 (SEQ ID NO:1377) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 871, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein T11628_PEA_(—)1_P5 (SEQ ID NO:1377) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 871 Amino acid mutations SNP position(s) on amino acid sequenceAlternative amino acid(s) Previously known SNP? 37 Q -> R No 80 A -> No86 K -> No 98 Q -> No

Variant protein T11628_PEA_(—)1_P5 (SEQ ID NO:1377) is encoded by thefollowing transcript(s): T11628_PEA_(—)1_T9 (SEQ ID NO:107), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript T11628_PEA_(—)1_T9 (SEQ ID NO:107) is shown inbold; this coding portion starts at position 211 and ends at position507. The transcript also has the following SNPs as listed in Table 872(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinT11628_PEA_(—)1_P5 (SEQ ID NO:1377) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 872 Nucleic acid SNPs SNP position on nucleotide sequenceAlternative nucleic acid Previously known SNP? 2 C -> T Yes 175 T -> No219 G -> A Yes 249 C -> T Yes 320 A -> G No 324 G -> A No 449 C -> No468 G -> No 504 G -> No 512 C -> No 512 C -> A No 543 C -> No 613 T -> GNo 646 G -> T No 652 G -> T No 676 C -> No 760 T -> G No 801 A -> G Yes943 G -> No 1044 A -> G No

Variant protein T11628_PEA_(—)1_P7 (SEQ ID NO:1378) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) T11628_PEA_(—)1_T11 (SEQ IDNO:108). An alignment is given to the known protein (Myoglobin (SEQ IDNO:1448)) at the end of the application. One or more alignments to oneor more previously published protein sequences are given at the end ofthe application. A brief description of the relationship of the variantprotein according to the present invention to each such aligned proteinis as follows:

Comparison report between T11628_PEA_(—)1_P7 (SEQ ID NO:1378) andMYG_HUMAN_V1 (SEQ ID NO:1449):

1. An isolated chimeric polypeptide encoding for T11628_PEA_(—)1_P7 (SEQID NO:1378), comprising a first amino acid sequence being at least 90%homologous toMGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASEDLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKHPGDFGADAQGAMNKcorresponding to amino acids 1-134 of MYG_HUMAN_V1 (SEQ ID NO:1449),which also corresponds to amino acids 1-134 of T11628_PEA_(—)1_P7 (SEQID NO:1378), and a second amino acid sequence being at least 70%optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence G corresponding to amino acids 135-135 ofT11628_PEA_(—)1_P7 (SEQ ID NO:1378), wherein said first amino acidsequence and second amino acid sequence are contiguous and in asequential order.

It should be noted that the known protein sequence (MYG_HUMAN (SEQ IDNO:1448)) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence forMYG_HUMAN_V1 (SEQ ID NO:1449). These changes were previously known tooccur and are listed in the table below.

TABLE 873 Changes to MYG_HUMAN_V1 (SEQ ID NO: 1449) SNP position(s) onamino acid sequence Type of change 1 init_met

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:intracellularly. The protein localization is believed to beintracellularly because neither of the trans-membrane region predictionprograms predicted a trans-membrane region for this protein. In additionboth signal-peptide prediction programs predict that this protein is anon-secreted protein.

Variant protein T11628_PEA_(—)1_P7 (SEQ ID NO:1378) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 874, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein T11628_PEA_(—)1_P7 (SEQ ID NO:1378) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 874 Amino acid mutations SNP position(s) on amino acid sequenceAlternative amino acid(s) Previously known SNP? 26 G -> No 44 F -> No 92Q -> R No

Variant protein T11628_PEA_(—)1_P7 (SEQ ID NO:1378) is encoded by thefollowing transcript(s): T11628_PEA_(—)1_T11 (SEQ ID NO:108), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript T11628_PEA_(—)1_T11 (SEQ ID NO:108) is shown inbold; this coding portion starts at position 319 and ends at position723. The transcript also has the following SNPs as listed in Table 875(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinT11628_PEA_(—)1_P7 (SEQ ID NO:1378) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 875 Nucleic acid SNPs SNP position on nucleotide sequenceAlternative nucleic acid Previously known SNP? 394 G -> No 448 T -> No492 G -> A Yes 522 C -> T Yes 593 A -> G No 597 G -> A No 728 C -> No728 C -> A No 759 C -> No 829 T -> G No 862 G -> T No 868 G -> T No 892C -> No 976 T -> G No 1017 A -> G Yes 1159 G -> No 1260 A -> G No

Variant protein T11628_PEA_(—)1_P10 (SEQ ID NO:1379) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) T11628_PEA_l_T4 (SEQ IDNO:104). An alignment is given to the known protein (Myoglobin (SEQ IDNO:1448)) at the end of the application. One or more alignments to oneor more previously published protein sequences are given at the end ofthe application. A brief description of the relationship of the variantprotein according to the present invention to each such aligned proteinis as follows:

Comparison report between T11628_PEA_(—)1_P10 (SEQ ID NO:1379) andQ8WVH6 (SEQ ID NO: 1450):

1.An isolated chimeric polypeptide encoding for T11628_PEA_(—)1_P10 (SEQID NO:1379), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequenceMGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDE (SEQ ID NO:1735) corresponding to amino acids 1-55 of T11628_PEA_(—)1_P10 (SEQ IDNO:1379), and a second amino acid sequence being at least 90% homologoustoMKASEDLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKHPGDFGADAQGAMNKALELFRKDMASNYKELGFQG corresponding to amino acids 1-99 of Q8WVH6(SEQ ID NO:1450), which also corresponds to amino acids 56-154 ofT11628_PEA_(—)1_P10 (SEQ ID NO:1379), wherein said first amino acidsequence and second amino acid sequence are contiguous and in asequential order.

2. An isolated polypeptide encoding for a head of T11628_PEA_(—)1_P10(SEQ ID NO:1379), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequenceMGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDE (SEQ ID NO:1735) of T11628_PEA_(—)1_P10 (SEQ ID NO:1379).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:intracellularly. The protein localization is believed to beintracellularly because neither of the trans-membrane region predictionprograms predicted a trans-membrane region for this protein. In additionboth signal-peptide prediction programs predict that this protein is anon-secreted protein.

Variant protein T11628_PEA_(—)1_P10 (SEQ ID NO:1379) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 876, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein T11628_PEA_(—)1_P10 (SEQ ID NO:1379) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 876 Amino acid mutations SNP position(s) on amino acid sequenceAlternative amino acid(s) Previously known SNP? 26 G -> No 44 F -> No 92Q -> R No 135 A -> No 141 K -> No 153 Q -> No

Variant protein T11628_PEA_(—)1_P10 (SEQ ID NO:1379) is encoded by thefollowing transcript(s): T11628_PEA_(—)1_T4 (SEQ ID NO:104), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript T11628_PEA_(—)1_T4 (SEQ ID NO:104) is shown inbold; this coding portion starts at position 205 and ends at position666. The transcript also has the following SNPs as listed in Table 877(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinT11628_PEA_(—)1_P10 (SEQ ID NO:1379) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 877 Nucleic acid SNPs SNP position on nucleotide sequenceAlternative nucleic acid Previously known SNP? 280 G -> No 334 T -> No378 G -> A Yes 408 C -> T Yes 479 A -> G No 483 G -> A No 608 C -> No627 G -> No 663 G -> No 671 C -> No 671 C -> A No 702 C -> No 772 T -> GNo 805 G -> T No 811 G -> T No 835 C -> No 919 T -> G No 960 A -> G Yes1102 G -> No 1203 A -> G No

As noted above, cluster T116123 features 25 segment(s), winch werelisted in Table 865 above and for which the sequence(s) are given at theend of the application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster T11628_PEA_(—)1_node_(—)7 (SEQ ID NO:789) according tothe present invention is supported by 9 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T11628_PEA_(—)1_T3 (SEQ IDNO:103). Table 878 below describes the starting and ending position ofthis segment on each transcript.

TABLE 878 Segment location on transcripts Segment starting SegmentTranscript name position ending position T11628_PEA_1_T3 (SEQ ID NO:103) 1 211

Segment cluster T11628_PEA_(—)1_node_(—)11 (SEQ ID NO:790) according tothe present invention is supported by 1 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T11628_PEA_(—)1_T5 (SEQ IDNO:105). Table 879 below describes the starting and ending position ofthis segment on each transcript.

TABLE 879 Segment location on transcripts Segment starting SegmentTranscript name position ending position T11628_PEA_1_T5 (SEQ ID NO:105) 48 178

Segment cluster T11628_PEA_(—)1_node_(—)16 (SEQ ID NO:791) according tothe present invention is supported by 38 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T11628_PEA_(—)1_T11 (SEQ IDNO:108). Table 880 below describes the starting and ending position ofthis segment on each transcript.

TABLE 880 Segment location on transcripts Segment starting SegmentTranscript name position ending position T11628_PEA_1_T11 (SEQ ID NO:108) 1 214

Segment cluster T11628_PEA_(—)1_node_(—)22 (SEQ ID NO:792) according tothe present invention is supported by 1 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T11628_PEA_(—)1_T9 (SEQ IDNO:107). Table 881 below describes the starting and ending position ofthis segment on each transcript.

TABLE 881 Segment location on transcripts Segment starting SegmentTranscript name position ending position T11628_PEA_1_T9 (SEQ ID NO:107) 1 140

Segment cluster T11628_PEA_(—)1_node_(—)25 (SEQ ID NO:793) according tothe present invention is supported by 129 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T11628_PEA_(—)1_T3 (SEQ IDNO:103), T11628_PEA_(—)1_T4 (SEQ ID NO:104), T11628_PEA_(—)1_T5 (SEQ IDNO:105), T11628_PEA_(—)1_T7 (SEQ ID NO:106), T11628_PEA_(—)1_T9 (SEQ IDNO:107) and T11628_PEA_(—)1_T11 (SEQ ID NO:108). Table 882 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 882 Segment location on transcripts Segment starting SegmentTranscript name position ending position T11628_PEA_1_T3 (SEQ ID NO:103) 395 537 T11628_PEA_1_T4 (SEQ ID NO: 104) 380 522 T11628_PEA_1_T5(SEQ ID NO: 105) 362 504 T11628_PEA_1_T7 (SEQ ID NO: 106) 347 489T11628_PEA_1_T9 (SEQ ID NO: 107) 221 363 T11628_PEA_1_T11 (SEQ ID NO:108) 494 636

Microarray (chip) data is also available for this segment as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotides were found to hit this segment (in relation to lungcancer), shown in Table 883.

TABLE 883 Oligonucleotides related to this segment Overexpressed in ChipOligonucleotide name cancers reference T11628_0_9_0 (SEQ ID NO: 237)lung malignant tumors LUN

Segment cluster T11628_PEA_(—)1_node_(—)31 (SEQ ID NO:794) according tothe present invention is supported by 137 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T11628_PEA_(—)1_T3 (SEQ IDNO:103), T11628_PEA_(—)1_T4 (SEQ ID NO:104), T11628_PEA_(—)1_T5 (SEQ IDNO:105), T11628_PEA_(—)1_T7 (SEQ ID NO:106), T11628_PEA_(—)1_T9 (SEQ IDNO:107) and T11628_PEA_(—)1_T11 (SEQ ID NO:108). Table 884 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 884 Segment location on transcripts Segment starting SegmentTranscript name position ending position T11628_PEA_1_T3 (SEQ ID NO:103) 702 831 T11628_PEA_1_T4 (SEQ ID NO: 104) 687 816 T11628_PEA_1_T5(SEQ ID NO: 105) 669 798 T11628_PEA_1_T7 (SEQ ID NO: 106) 654 783T11628_PEA_1_T9 (SEQ ID NO: 107) 528 657 T11628_PEA_1_T11 (SEQ ID NO:108) 744 873

Segment cluster T11628_PEA_(—)1_(—)37 (SEQ ID NO:795) according to thepresent invention is supported by 99 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): T11628_PEA_(—)1_T3 (SEQ ID NO:103),T11628_PEA_(—)1_T4 (SEQ ID NO:104), T11628_PEA_(—)1_T5 (SEQ ID NO:105),T11628_PEA_(—)1_T7 (SEQ ID NO:106), T11628_PEA_(—)1_T9 (SEQ ID NO:107)and T11628_PEA_(—)1_T11 (SEQ ID NO:108). Table 885 below describes thestarting and ending position of this segment on each transcript.

TABLE 885 Segment location on transcripts Segment starting SegmentTranscript name position ending position T11628_PEA_1_T3 (SEQ ID NO:103) 1086 1225 T11628_PEA_1_T4 (SEQ ID NO: 104) 1071 1210T11628_PEA_1_T5 (SEQ ID NO: 105) 1053 1192 T11628_PEA_1_T7 (SEQ ID NO:106) 1038 1177 T11628_PEA_1_T9 (SEQ ID NO: 107) 912 1051T11628_PEA_1_T11 (SEQ ID NO: 108) 1128 1267

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster T11628_PEA_(—)1_node_(—)0 (SEQ ID NO:796) according tothe present invention is supported by 1 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T11628_PEA_(—)1_T4 (SEQ IDNO:104). Table 886 below describes the starting and ending position ofthis segment on each transcript.

TABLE 886 Segment location on transcripts Segment starting SegmentTranscript name position ending position T11628_PEA_1_T4 (SEQ ID NO:104) 1 93

Segment cluster T11628_PEA_(—)1_node_(—)4 (SEQ ID NO:797) according tothe present invention is supported by 2 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T11628_PEA_(—)1_T4 (SEQ IDNO:104). Table 887 below describes the starting and ending position ofthis segment on each transcript.

TABLE 887 Segment location on transcripts Segment starting SegmentTranscript name position ending position T11628_PEA_1_T4 (SEQ ID NO:104) 94 196

Segment cluster T11628_PEA_(—)1_node_(—)9 (SEQ ID NO:798) according tothe present invention is supported by 16 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T11628_PEA_(—)1_T5 (SEQ ID NO:105)and T11628_PEA_(—)1_T7 (SEQ ID NO:106). Table 888 below describes thestarting and ending position of this segment on each transcript.

TABLE 888 Segment location on transcripts Segment starting SegmentTranscript name position ending position T11628_PEA_1_T5 (SEQ ID NO:105) 1 47 T11628_PEA_1_T7 (SEQ ID NO: 106) 1 47

Segment cluster T11628_PEA_(—)1_node_(—)13 (SEQ ID NO:799) according tothe present invention can be found in the following transcript(s):T11628_PEA_(—)1_T7 (SEQ ID NO:106). Table 889 below describes thestarting and ending position of this segment on each transcript.

TABLE 889 Segment location on transcripts Segment starting SegmentTranscript name position ending position T11628_PEA_1_T7 (SEQ ID NO:106) 48 65

Segment cluster T11628_PEA_(—)1_node_(—)14 (SEQ ID NO:800) according tothe present invention is supported by 1 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T11628_PEA_(—)1_T7 (SEQ IDNO:106). Table 890 below describes the starting and ending position ofthis segment on each transcript.

TABLE 890 Segment location on transcripts Segment starting SegmentTranscript name position ending position T11628_PEA_1_T7 (SEQ ID NO:106) 66 163

Segment cluster T11628_PEA_(—)1_node_(—)17 (SEQ ID NO:801) according tothe present invention is supported by 55 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T11628_PEA_(—)1_T11 (SEQ IDNO:108). Table 891 below describes the starting and ending position ofthis segment on each transcript.

TABLE 891 Segment location on transcripts Segment starting SegmentTranscript name position ending position T11628_PEA_1_T11 (SEQ ID NO:108) 215 310

Segment cluster T11628_PEA_(—)1_node_(—)18 (SEQ ID NO:802) according tothe present invention is supported by 98 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T11628_PEA_(—)1_T3 (SEQ IDNO:103), T11628_PEA_(—)1_T4 (SEQ ID NO:104), T11628_PEA_(—)1_T5 (SEQ IDNO:105), T11628_PEA_(—)1_T7 (SEQ ID NO:106) and T11628_PEA_(—)1_T11 (SEQID NO:108). Table 892 below describes the starting and ending positionof this segment on each transcript.

TABLE 892 Segment location on transcripts Segment starting SegmentTranscript name position ending position T11628_PEA_1_T3 (SEQ ID NO:103) 212 289 T11628_PEA_1_T4 (SEQ ID NO: 104) 197 274 T11628_PEA_1_T5(SEQ ID NO: 105) 179 256 T11628_PEA_1_T7 (SEQ ID NO: 106) 164 241T11628_PEA_1_T11 (SEQ ID NO: 108) 311 388

Segment cluster T11628_PEA_(—)1_node_(—)19 (SEQ 11) NU:803) according tothe present invention can be found in the following transcript(s):T11628_PEA_(—)1_T3 (SEQ ID NO:103), T11628_PEA_(—)1_T4 (SEQ ID NO:104),T11628_PEA_(—)1_T5 (SEQ ID NO:105), T11628_PEA_(—)1_T7 (SEQ ID NO:106)and T11628_PEA_(—)1_T11 (SEQ ID NO:108). Table 893 below describes thestarting and ending position of this segment on each transcript.

TABLE 893 Segment location on transcripts Segment starting SegmentTranscript name position ending position T11628_PEA_1_T3 (SEQ ID NO:103) 290 314 T11628_PEA_1_T4 (SEQ ID NO: 104) 275 299 T11628_PEA_1_T5(SEQ ID NO: 105) 257 281 T11628_PEA_1_T7 (SEQ ID NO: 106) 242 266T11628_PEA_1_T11 (SEQ ID NO: 108) 389 413

Segment cluster T11628_PEA_(—)1_node_(—)24 (SEQ ID NO:804) according tothe present invention is supported by 112 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T11628_PEA_(—)1_T3 (SEQ IDNO:103), T11628_PEA_(—)1_T4 (SEQ ID NO:104), T11628_PEA_(—)1_T5 (SEQ IDNO:105), T11628_PEA_(—)1_T7 (SEQ ID NO:106), T11628_PEA_(—)1_T9 (SEQ IDNO:107) and T11628_PEA_(—)1_T11 (SEQ ID NO:108). Table 894 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 894 Segment location on transcripts Segment starting SegmentTranscript name position ending position T11628_PEA_1_T3 (SEQ ID NO:103) 315 394 T11628_PEA_1_T4 (SEQ ID NO: 104) 300 379 T11628_PEA_1_T5(SEQ ID NO: 105) 282 361 T11628_PEA_1_T7 (SEQ ID NO: 106) 267 346T11628_PEA_1_T9 (SEQ ID NO: 107) 141 220 T11628_PEA_1_T11 (SEQ ID NO:108) 414 493

Segment cluster T11628_PEA_(—)1_node_(—)27 (SEQ ID NO:805) according tothe present invention is supported by 119 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T11628_PEA_(—)1_T3 (SEQ IDNO:103), T11628_PEA_(—)1_T4 (SEQ ID NO:104), T11628_PEA_(—)1_T5 (SEQ IDNO:105), T11628_PEA_(—)1_T7 (SEQ ID NO:106), T11628_PEA_(—)1_T9 (SEQ IDNO:107) and T11628_PEA_(—)1_T11 (SEQ ID NO:108). Table 895 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 895 Segment location on transcripts Segment Segment endingTranscript name starting position position T11628_PEA_1_T3 (SEQ ID NO:103) 538 621 T11628_PEA_1_T4 (SEQ ID NO: 104) 523 606 T11628_PEA_1_T5(SEQ ID NO: 105) 505 588 T11628_PEA_1_T7 (SEQ ID NO: 106) 490 573T11628_PEA_1_T9 (SEQ ID NO: 107) 364 447 T11628_PEA_1_T11 (SEQ ID NO:108) 637 720

Microarray (chip) data is also available for this segment as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotides were found to hit this segment (in relation to lungcancer), shown in Table 896

TABLE 896 Oligonucleotides related to this segment Overexpressed ChipOligonucleotide name in cancers reference T11628_0_9_0 (SEQ ID NO: 237)lung malignant tumors LUN

Segment cluster T11628_PEA_(—)1_node_(—)28 (SEQ ID NO:806) according tome present invention is supported by 115 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T11628_PEA_(—)1_T3 (SEQ IDNO:103), T11628_PEA_(—)1_T4 (SEQ ID NO:104), T11628_PEA_(—)1_T5 (SEQ IDNO:105), T11628_PEA_(—)1_T7 (SEQ ID NO:106) and T11628_PEA_(—)1_T9 (SEQID NO:107). Table 897 below describes the starting and ending positionof this segment on each transcript.

TABLE 897 Segment location on transcripts Segment Segment endingTranscript name starting position position T11628_PEA_1_T3 (SEQ ID NO:103) 622 650 T11628_PEA_1_T4 (SEQ ID NO: 104) 607 635 T11628_PEA_1_T5(SEQ ID NO: 105) 589 617 T11628_PEA_1_T7 (SEQ ID NO: 106) 574 602T11628_PEA_1_T9 (SEQ ID NO: 107) 448 476

Segment cluster T11628_PEA_(—)1_node_(—)29 (SEQ ID NO:807) according tothe present invention is supported by 113 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T11628_PEA_(—)1_T3 (SEQ IDNO:103), T11628_PEA_(—)1_T4 (SEQ ID NO:104), T11628_PEA_(—)1_T5 (SEQ IDNO:105), T11628_PEA_(—)1_T7 (SEQ ID NO:106) and T11628_PEA_(—)1_T9 (SEQID NO:107). Table 898 below describes the starting and ending positionof this segment on each transcript.

TABLE 898 Segment location on transcripts Segment Segment endingTranscript name starting position position T11628_PEA_1_T3 (SEQ ID NO:103) 651 678 T11628_PEA_1_T4 (SEQ ID NO: 104) 636 663 T11628_PEA_1_T5(SEQ ID NO: 105) 618 645 T11628_PEA_1_T7 (SEQ ID NO: 106) 603 630T11628_PEA_1_T9 (SEQ ID NO: 107) 477 504

Segment cluster T11628_PEA_(—)1_node_(—)30 (SEQ ID NO:808) according tothe present invention can be found in the following transcript(s):T11628_PEA_(—)1_T3 (SEQ ID NO:103), T11628_PEA_(—)1_T4 (SEQ ID NO:104),T11628_PEA_(—)1_T5 (SEQ ID NO:105), T11628_PEA_(—)1_T7 (SEQ ID NO:106),T11628_PEA_(—)1_T9 (SEQ ID NO:107) and T11628_PEA_(—)1_T11 (SEQ IDNO:108). Table 899 below describes the starting and ending position ofthis segment on each transcript.

TABLE 899 Segment location on transcripts Segment Segment endingTranscript name starting position position T11628_PEA_1_T3 (SEQ ID NO:103) 679 701 T11628_PEA_1_T4 (SEQ ID NO: 104) 664 686 T11628_PEA_1_T5(SEQ ID NO: 105) 646 668 T11628_PEA_1_T7 (SEQ ID NO: 106) 631 653T11628_PEA_1_T9 (SEQ ID NO: 107) 505 527 T11628_PEA_1_T11 (SEQ ID NO:108) 721 743

Segment cluster T11628_PEA_(—)1_node_(—)32 (SEQ ID NO:809) according tothe present invention can be found in the following transcript(s):T11628_PEA_(—)1_T3 (SEQ ID NO:103), T11628_PEA_(—)1_T4 (SEQ ID NO:104),T11628_PEA_(—)1_T5 (SEQ ID NO:105), T11628_PEA_(—)1_T7 (SEQ ID NO:106),T11628_PEA_(—)1_T9 (SEQ ID NO:107) and T11628_PEA_(—)1_T11 (SEQ IDNO:108). Table 900 below describes the starting and ending position ofthis segment on each transcript.

TABLE 900 Segment location on transcripts Segment Segment endingTranscript name starting position position T11628_PEA_1_T3 (SEQ ID NO:103) 832 844 T11628_PEA_1_T4 (SEQ ID NO: 104) 817 829 T11628_PEA_1_T5(SEQ ID NO: 105) 799 811 T11628_PEA_1_T7 (SEQ ID NO: 106) 784 796T11628_PEA_1_T9 (SEQ ID NO: 107) 658 670 T11628_PEA_1_T11 (SEQ ID NO:108) 874 886

Segment cluster T11628_PEA_(—)1_node_(—)33 (SEQ ID NO:810) according tothe present invention can be found in the following transcript(s):T11628_PEA_(—)1_T3 (SEQ ID NO:103), T11628_PEA_(—)1_T4 (SEQ ID NO:104),T11628_PEA_(—)1_T5 (SEQ ID NO:105), T11628_PEA_(—)1_T7 (SEQ ID NO:106),T11628_PEA_(—)1_T9 (SEQ ID NO:107) and T11628_PEA_(—)1_T11 (SEQ IDNO:108). Table 901 below describes the starting and ending position ofthis segment on each transcript.

TABLE 901 Segment location on transcripts Segment Segment endingTranscript name starting position position T11628_PEA_1_T3 (SEQ ID NO:103) 845 866 T11628_PEA_1_T4 (SEQ ID NO: 104) 830 851 T11628_PEA_1_T5(SEQ ID NO: 105) 812 833 T11628_PEA_1_T7 (SEQ ID NO: 106) 797 818T11628_PEA_1_T9 (SEQ ID NO: 107) 671 692 T11628_PEA_1_T11 (SEQ ID NO:108) 887 908

Segment cluster T11628_PEA_(—)1_node_(—)34 (SEQ ID NO:811) according tothe present invention is supported by 122 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T11628_PEA_(—)1_T3 (SEQ IDNO:103), T11628_PEA_(—)1_T4 (SEQ ID NO:104), T11628_PEA_(—)1_T5 (SEQ IDNO:105), T11628_PEA_(—)1_T7 (SEQ ID NO:106), T11628_PEA_(—)1_T9 (SEQ IDNO:107) and T11628_PEA_(—)1_T11 (SEQ ID NO:108). Table 902 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 902 Segment location on transcripts Segment Segment endingTranscript name starting position position T11628_PEA_1_T3 (SEQ ID NO:103) 867 911 T11628_PEA_1_T4 (SEQ ID NO: 104) 852 896 T11628_PEA_1_T5(SEQ ID NO: 105) 834 878 T11628_PEA_1_T7 (SEQ ID NO: 106) 819 863T11628_PEA_1_T9 (SEQ ID NO: 107) 693 737 T11628_PEA_1_T11 (SEQ ID NO:108) 909 953

Segment cluster T11628_PEA_(—)1_node_(—)35 (SEQ ID NO:812) according tothe present invention is supported by 126 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T11628_PEA_(—)1_T3 (SEQ IDNO:103), T11628_PEA_(—)1_T4 (SEQ ID NO:104), T11628_PEA_(—)1_T5 (SEQ IDNO:105), T11628_PEA_(—)1_T7 (SEQ ID NO:106), T11628_PEA_(—)1_T9 (SEQ IDNO:107) and T11628_PEA_(—)1_T11 (SEQ ID NO:108). Table 903 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 903 Segment location on transcripts Segment Segment endingTranscript name starting position position T11628_PEA_1_T3 (SEQ ID NO:103) 912 967 T11628_PEA_1_T4 (SEQ ID NO: 104) 897 952 T11628_PEA_1_T5(SEQ ID NO: 105) 879 934 T11628_PEA_1_T7 (SEQ ID NO: 106) 864 919T11628_PEA_1_T9 (SEQ ID NO: 107) 738 793 T11628_PEA_1_T11 (SEQ ID NO:108) 954 1009

Segment cluster T11628_PEA_(—)1_node_(—)36 (SEQ ID NO:813) according tothe present invention is supported by 122 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): T11628_PEA_(—)1_T3 (SEQ IDNO:103), T11628_PEA_(—)1_T4 (SEQ ID NO:104), T11628_PEA_(—)1_T5 (SEQ IDNO:105), T11628_PEA_(—)1_T7 (SEQ ID NO:106), T11628_PEA_(—)1_T9 (SEQ IDNO:107) and T11628_PEA_(—)1_T11 (SEQ ID NO:108). Table 904 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 904 Segment location on transcripts Segment Segment endingTranscript name starting position position T11628_PEA_1_T3 (SEQ ID NO:103) 968 1085 T11628_PEA_1_T4 (SEQ ID NO: 104) 953 1070 T11628_PEA_1_T5(SEQ ID NO: 105) 935 1052 T11628_PEA_1_T7 (SEQ ID NO: 106) 920 1037T11628_PEA_1_T9 (SEQ ID NO: 107) 794 911 T11628_PEA_1_T11 (SEQ ID NO:108) 1010 1127Variant protein alignment to the previously known protein:

Sequence name: Q8WVH6 (SEQ ID NO: 1450) Sequence documentation:Alignment of: T11628_PEA_1_P2 (SEQ ID NO: 1376) × Q8WVH6 (SEQ ID NO:1450) Alignment segment 1/1: Quality: 962.00 Escore: 0 Matching length:99 Total length: 99 Matching Percent Similarity: 100.00 Matching PercentIdentity: 100.00 Total Percent Similarity: 100.00 Total PercentIdentity: 100.00 Gaps: 0 Alignment:

Sequence name: MYG_HUMAN_V1 (SEQ ID NO: 1449) Sequence documentation:Alignment of: T11628_PEA_1_P5 (SEQ ID NO: 1377) × MYG_HUMAN_V1 (SEQ IDNO: 1449) . . . Alignment segment 1/1: Quality: 962.00 Escore: 0Matching length: 99 Total length: 99 Matching Percent Similarity: 100.00Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 TotalPercent Identity: 100.00 Gaps: 0 Alignment:

Sequence name: MYG_HUMAN_V1 (SEQ ID NO: 1449) Sequence documentation:Alignment of: T11628_PEA_1_P7 (SEQ ID NO: 1378) × MYG_HUMAN_V1 (SEQ IDNO: 1449) . . . Alignment segment 1/1: Quality: 1315.00 Escore: 0Matching length: 134 Total length: 134 Matching Percent Similarity:100.00 Matching Percent Identity: 100.00 Total Percent Similarity:100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment:

Sequence name: Q8WVHG (SEQ ID NO: 1450) Sequence documentation:Alignment of: T11628_PEA_1_P10 (SEQ ID NO: 1379) × Q8WVHG (SEQ ID NO:1450) Alignment segment 1/1: Quality: 962.00 Escore: 0 Matching length:99 Total length: 99 Matching Percent Similarity: 100.00 Matching PercentIdentity: 100.00 Total Percent Similarity: 100.00 Total PercentIdentity: 100.00 Gaps: 0 Alignment:

Description for Cluster Humcea

Cluster HUMCEA features 5 transcript(s) and 42 segment(s) of interest,the names for which are given in Tables 905 and 906, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 907.

TABLE 905 Transcripts of interest Transcript Name Sequence ID No.HUMCEA_PEA_1_T8 109 HUMCEA_PEA_1_T9 110 HUMCEA_PEA_1_T20 111HUMCEA_PEA_1_T25 112 HUMCEA_PEA_1_T26 113

TABLE 906 Segments of interest Segment Name Sequence ID No.HUMCEA_PEA_1_node_0 814 HUMCEA_PEA_1_node_2 815 HUMCEA_PEA_1_node_11 816HUMCEA_PEA_1_node_12 817 HUMCEA_PEA_1_node_31 818 HUMCEA_PEA_1_node_36819 HUMCEA_PEA_1_node_44 820 HUMCEA_PEA_1_node_46 821HUMCEA_PEA_1_node_63 822 HUMCEA_PEA_1_node_65 823 HUMCEA_PEA_1_node_67824 HUMCEA_PEA_1_node_3 825 HUMCEA_PEA_1_node_7 826 HUMCEA_PEA_1_node_8827 HUMCEA_PEA_1_node_9 828 HUMCEA_PEA_1_node_10 829HUMCEA_PEA_1_node_15 830 HUMCEA_PEA_1_node_16 831 HUMCEA_PEA_1_node_17832 HUMCEA_PEA_1_node_18 833 HUMCEA_PEA_1_node_19 834HUMCEA_PEA_1_node_20 835 HUMCEA_PEA_1_node_21 836 HUMCEA_PEA_1_node_22837 HUMCEA_PEA_1_node_23 838 HUMCEA_PEA_1_node_24 839HUMCEA_PEA_1_node_27 840 HUMCEA_PEA_1_node_29 841 HUMCEA_PEA_1_node_30842 HUMCEA_PEA_1_node_33 843 HUMCEA_PEA_1_node_34 844HUMCEA_PEA_1_node_35 845 HUMCEA_PEA_1_node_45 846 HUMCEA_PEA_1_node_50847 HUMCEA_PEA_1_node_51 848 HUMCEA_PEA_1_node_56 849HUMCEA_PEA_1_node_57 850 HUMCEA_PEA_1_node_58 851 HUMCEA_PEA_1_node_60852 HUMCEA_PEA_1_node_61 853 HUMCEA_PEA_1_node_62 854HUMCEA_PEA_1_node_64 855

TABLE 907 Proteins of interest Sequence Protein Name ID No.Corresponding Transcript(s) HUMCEA_PEA_1_P4 1380 HUMCEA_PEA_1_T8 (SEQ IDNO: 109) HUMCEA_PEA_1_P5 1381 HUMCEA_PEA_1_T9 (SEQ ID NO: 110)HUMCEA_PEA_1_P14 1382 HUMCEA_PEA_1_T20 (SEQ ID NO: 111) HUMCEA_PEA_1_P191383 HUMCEA_PEA_1_T25 (SEQ ID NO: 112) HUMCEA_PEA_1_P20 1384HUMCEA_PEA_1_T26 (SEQ ID NO: 113)

These sequences are variants of the known protein Carcinoembryonicantigen-related cell adhesion molecule 5 precursor (SwissProt accessionidentifier CEA5_HUMAN; known also according to the synonymsCarcinoembryonic antigen; CEA; Meconium antigen 100; CD66e antigen), SEQID NO: 1451, referred to herein as the previously known protein.

The sequence for protein Carcinoembryonic antigen-related cell adhesionmolecule 5 precursor (SEQ ID NO:1451) is given at the end of theapplication, as “Carcinoembryonic antigen-related cell adhesion molecule5 precursor amino acid sequence”. Known polymorphisms for this sequenceare as shown in Table 908

TABLE 908 Amino acid mutations for Known Protein SNP position(s) onamino acid sequence Comment 320 Missing

Protein Carcinoembryonic antigen-related cell adhesion molecule 5precursor (SEQ ID NO:1451) localization is believed to be attached tothe membrane by a GPI-anchor.

The previously known protein also has the following indication(s) and/orpotential therapeutic use(s): Cancer. It has been investigated forclinical/therapeutic use in humans, for example as a target for anantibody or small molecule, and/or as a direct therapeutic; availableinformation related to these investigations is as follows. Potentialpharmaceutically related or therapeutically related activity oractivities of the previously known protein are as follows:Immunostimulant. A therapeutic role for a protein represented by thecluster has been predicted. The cluster was assigned this field becausethere was information in the drug database or the public databases(e.g., described herein above) that this protein, or part thereof, isused or can be used for a potential therapeutic indication: Imagingagent; Anticancer; Immunostimulant; Immunoconjugate; Monoclonalantibody, murine; Antisense therapy; antibody.

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: integral plasma membraneprotein; membrane, which are annotation(s) related to CellularComponent.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

Cluster HUMCEA can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 33 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 33 and Table 909. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions:epithelial malignant tumors, a mixture of malignant tumors fromdifferent tissues and pancreas carcinoma.

TABLE 909 Normal tissue distribution Name of Tissue Number colon 1175epithelial 92 general 29 head and neck 81 kidney 0 lung 0 lymph nodes 0breast 0 pancreas 0 prostate 0 stomach 256

TABLE 910 P values and ratios for expression in cancerous tissue Name ofTissue P1 P2 SP1 R3 SP2 R4 colon 2.0e−01 2.7e−01 9.8e−01 0.5 1 0.5epithelial 2.1e−03 2.7e−02 6.4e−04 1.4 2.1e−01 1.0 general 3.9e−088.2e−06 9.2e−18 3.2 1.3e−10 2.2 head and neck 3.4e−01 5.0e−01 2.1e−011.8 5.6e−01 0.9 kidney 4.3e−01 5.3e−01 5.8e−01 2.1 7.0e−01 1.6 lung1.3e−01 2.6e−01 1 1.1 1 1.1 lymph nodes 3.1e−01 5.7e−01 8.1e−02 6.03.3e−01 2.5 breast 3.8e−01 1.5e−01 1 1.0 6.8e−01 1.5 pancreas 2.2e−022.3e−02 1.4e−08 7.8 7.4e−07 6.4 prostate 5.3e−01 6.0e−01 3.0e−01 2.54.2e−01 2.0 stomach 1.5e−01 4.7e−01 8.9e−01 0.6 7.2e−01 0.4

For this cluster, at least one oligonucleotide was found to demonstrateoverexpression of the cluster, although not of at least onetranscript/segment as listed below. Microarray (chip) data is alsoavailable for this cluster as follows. Various oligonucleotides weretested for being differentially expressed in various disease conditions,particularly cancer, as previously described. The followingoligonucleotides were found to hit this cluster but not othersegments/transcripts below (in relation to lung cancer), shown in Table911.

TABLE 911 Oligonucleotides related to this cluster Overexpressed ChipOligonucleotide name in cancers reference HUMCEA_0_0_15168 lungmalignant tumors LUN (SEQ ID NO: 243)

As noted above, cluster HUMCEA features 5 transcript(s), which werelisted in Table 905 above. These transcript(s) encode for protein(s)which are variant(s) of protein Carcinoembryonic antigen-related celladhesion molecule 5 precursor (SEQ ID NO:1451). A description of eachvariant protein according to the present invention is now provided.

Variant protein HUMCEA_PEA_(—)1_P4 (SEQ ID NO:1380) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109). An alignment is given to the known protein (Carcinoembryonicantigen-related cell adhesion molecule 5 precursor (SEQ ID NO:1451)) atthe end of the application. One or more alignments to one or morepreviously published protein sequences are given at the end of theapplication. A brief description of the relationship of the variantprotein according to the present invention to each such aligned proteinis as follows:

Comparison report between HUMCEA_PEA_(—)1_P4 (SEQ ID NO:1380) andCEA5_HUMAN (SEQ ID NO:1451):

1. An isolated chimeric polypeptide encoding for HUMCEA_PEA_(—)1_P4 (SEQID NO:1380), comprising a first amino acid sequence being at least 90%homologous toMESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREIIYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSISSNNSKPVEDKDAVAFTCEPETQDATYLWWVNNQSLPVSPRLQLSNGNRTLTLFNVTRNDTASYKCETQNPVSARRSDSVILNVL corresponding to amino acids 1-234 of CEA5_HUMAN(SEQ ID NO:1451), which also corresponds to amino acids 1-234 ofHUMCEA_PEA_(—)1_P4 (SEQ ID NO:1380), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequenceCEYICSSLAQAASPNPQGQRQDFSVPLRFKYTDPQPWTSRLSVTFCPRKTWADQVLTKNRRGGAASVLGGSGSTPYDGRNR (SEQ ID NO: 1749) corresponding to amino acids 235-315 ofHUMCEA_PEA_(—)1_P4 (SEQ ID NO:1380), wherein said first amino acidsequence and second amino acid sequence are contiguous and in asequential order.

2. An isolated polypeptide encoding for a tail of HUMCEA_PEA_(—)1_P4(SEQ ID NO:1380), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequenceCEYICSSLAQAASPNPQGQRQDFSVPLRFKYTDPQPWTSRLSVTFCPRKTWADQVLTKNRRGGAASVLGGSGSTPYDGRNR (SEQ ID NO: 1749) in HUMCEA_PEA_(—)1_P4 (SEQ ID NO:1380).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein HUMCEA_PEA_(—)1_P4 (SEQ ID NO:1380) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 912, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein HUMCEA_PEA_(—)1_P4 (SEQ ID NO:1380) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 912 Amino acid mutations SNP position(s) on Alternative Previouslyamino acid sequence amino acid(s) known SNP? 63 F -> L No 80 I -> V Yes83 V -> A Yes 137 Q -> P Yes 173 D -> N No

The glycosylation sites of variant protein HUMCEA_PEA_(—)1_P4 (SEQ IDNO:1380), as compared to the known protein Carcinoembryonicantigen-related cell adhesion molecule 5 precursor (SEQ ID NO:1451), aredescribed in Table 913 (given according to their position(s) on theamino acid sequence in the first column; the second column indicateswhether the glycosylation site is present in the variant protein; andthe last column indicates whether the position is different on thevariant protein).

TABLE 913 Glycosylation site(s) Position(s) on known Present in Positionin amino acid sequence variant protein? variant protein? 197 yes 197 466no 360 no 288 no 665 no 560 no 650 no 480 no 104 yes 104 580 no 204 yes204 115 yes 115 208 yes 208 152 yes 152 309 no 432 no 351 no 246 no 182yes 182 612 no 256 no 508 no 330 no 274 no 292 no 553 no 529 no 375 no

Variant protein HUMCEA_PEA_(—)1_P4 (SEQ ID NO:1380) is encoded by thefollowing transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ ID NO:109), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript HUMCEA_PEA_(—)1_T8 (SEQ ID NO:109) is shown inbold; this coding portion starts at position 115 and ends at position1059. The transcript also has the following SNPs as listed in Table 914(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinHUMCEA_PEA_(—)1_P4 (SEQ ID NO:1380) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 914 Nucleic acid SNPs SNP position on nucleotide AlternativePreviously sequence nucleic acid known SNP? 49 T -> No 273 A -> C Yes303 T -> G No 324 T -> C Yes 352 A -> G Yes 362 T -> C Yes 524 A -> CYes 631 G -> A No 1315 A -> G No 1380 T -> C No 1533 C -> A Yes 1706 G-> A Yes 2308 T -> C No 2362 C -> T No 2455 A -> No 2504 C -> A Yes 2558G -> No 2623 G -> No 2639 T -> A No 2640 T -> A No 2832 G -> A Yes 2885C -> T No 3396 A -> G Yes 3562 C -> T Yes 3753 C -> T Yes

Variant protein HUMCEA_PEA_(—)1_P5 (SEQ ID NO:1381) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) HUMCEA_PEA_(—)1_T9 (SEQ IDNO:110). An alignment is given to the known protein (Carcinoembryonicantigen-related cell adhesion molecule 5 precursor (SEQ ID NO:1451)) atthe end of the application. One or more alignments to one or morepreviously published protein sequences are given at the end of theapplication. A brief description of the relationship of the variantprotein according to the present invention to each such aligned proteinis as follows:

Comparison report between HUMCEA_PEA_(—)1_P5 (SEQ ID NO:1381) andCEA5_HUMAN (SEQ ID NO:1451):

1. An isolated chimeric polypeptide encoding for HUMCEA_PEA_(—)1_P5 (SEQID NO:1381), comprising a first amino acid sequence being at least 90%homologous toMESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREIIYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSISSNNSKPVEDKDAVAFTCEPETQDATYLWWVNNQSLPVSPRLQLSNGNRTLTLFNVTRNDTASYKCETQNPVSARRSDSVILNVLYGPDAPTISPLNTSYRSGENLNLSCHAASNPPAQYSWFVNGTFQQSTQELFIPNITVNNSGSYTCQAHNSDTGLNRTTVTTITVYAEPPKPFITSNNSNPVEDEDAVALTCEPEIQNTTYLWWVNNQSLPVSPRLQLSNDNRTLTLLSVTRNDVGPYECGIQNELSVDHSDPVILNVLYGPDDPTISPSYTYYRPGVNLSLSCHAASNPPAQYSWLIDGNIQQHTQELFISNITEKNSGLYTCQANNSASGHSRTTVKTITVSAELPKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVNGQSLPVSPRLQLSNGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTPIISPPDSSYLSGANLNLSCHSASNPSPQYSWRINGIPQQHTQVLFIAKITPNNNGTYACFVSNLATGRNNSIVKSITVS corresponding to amino acids 1-675 ofCEA5_HUMAN (SEQ ID NO:1451), which also corresponds to amino acids 1-675of HUMCEA_PEA_(—)1_P5 (SEQ ID NO:1381), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequenceGKWLPGASASYSGVESIWFSPKSQEDIFFPSLCSMGTRKSQILS (SEQ ID NO: 1750)corresponding to amino acids 676-719 of HUMCEA_PEA_(—)1_P5 (SEQ IDNO:1381), wherein said first amino acid sequence and second amino acidsequence are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of HUMCEA_PEA_(—)1_P5(SEQ ID NO:1381), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence GKWLPGASASYSGVESIWFSPKSQEDIFFPSLCSMGTRKSQILS(SEQ ID NO: 1750) in HUMCEA_PEA_(—)1_P5 (SEQ ID NO:1381).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein HUMCEA_PEA_(—)1_P5 (SEQ ID NO:1381) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 915, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein HUMCEA_PEA_(—)1_P5 (SEQ ID NO:1381) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 915 Amino acid mutations SNP position(s) on Alternative Previouslyamino acid sequence amino acid(s) known SNP? 63 F -> L No 80 I -> V Yes83 V -> A Yes 137 Q -> P Yes 173 D -> N No 289 I -> T No 340 A -> D Yes398 E -> K Yes 647 P -> No 664 R -> S Yes

The glycosylation sites of variant protein HUMCEA_PEA_(—)1_P5 (SEQ IDNO:1381), as compared to the known protein Carcinoembryonicantigen-related cell adhesion molecule 5 precursor (SEQ ID NO:1451), aredescribed in Table 916 (given according to their position(s) on theamino acid sequence in the first column; the second column indicateswhether the glycosylation site is present in the variant protein; andthe last column indicates whether the position is different on thevariant protein).

TABLE 916 Glycosylation site(s) Position(s) on known Present in Positionin amino acid sequence variant protein? variant protein? 197 yes 197 466yes 466 360 yes 360 288 yes 288 665 yes 665 560 yes 560 650 yes 650 480yes 480 104 yes 104 580 yes 580 204 yes 204 115 yes 115 208 yes 208 152yes 152 309 yes 309 432 yes 432 351 yes 351 246 yes 246 182 yes 182 612yes 612 256 yes 256 508 yes 508 330 yes 330 274 yes 274 292 yes 292 553yes 553 529 yes 529 375 yes 375

Variant protein HUMCEA_PEA_(—)1_P5 (SEQ ID NO:1381) is encoded by thefollowing transcript(s): HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110) is shown inbold; this coding portion starts at position 115 and ends at position2271. The transcript also has the following SNPs as listed in Table 917(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinHUMCEA_PEA_(—)1_P5 (SEQ ID NO:1381) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 917 Nucleic acid SNPs SNP position on nucleotide AlternativePreviously sequence nucleic acid known SNP? 49 T -> No 273 A -> C Yes303 T -> G No 324 T -> C Yes 352 A -> G Yes 362 T -> C Yes 524 A -> CYes 631 G -> A No 915 A -> G No 980 T -> C No 1133 C -> A Yes 1306 G ->A Yes 1908 T -> C No 1962 C -> T No 2055 A -> No 2104 C -> A Yes 3259 T-> C Yes

Variant protein HUMCEA_PEA_(—)1_P14 (SEQ ID NO:1382) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) HUMCEA_PEA_(—)1_T20 (SEQ IDNO:111). The location of the variant protein was determined according toresults from a number of different software programs and analyses,including analyses from SignalP and other specialized programs. Thevariant protein is believed to be located as follows with regard to thecell: secreted. The protein localization is believed to be secretedbecause both signal-peptide prediction programs predict that thisprotein has a signal peptide, and neither trans-membrane regionprediction program predicts that this protein has a trans-membraneregion.

Variant protein HUMCEA_PEA_(—)1_P14 (SEQ ID NO:1382) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 918, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein HUMCEA_PEA_(—)1_P14 (SEQ ID NO:1382) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 918 Amino acid mutations SNP position(s) on amino acid AlternativePreviously sequence amino acid(s) known SNP? 63 F -> L No 80 I -> V Yes83 V -> A Yes 137 Q -> P Yes 173 D -> N No 289 I -> T No 340 A -> D Yes398 E -> K Yes

Variant protein HUMCEA_PEA_(—)1_P14 (SEQ ID NO:1382) is encoded by thefollowing transcript(s): HUMCEA_PEA_(—)1_T20 (SEQ ID NO:111), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript HUMCEA_PEA_(—)1_T20 (SEQ ID NO:111) is shown inbold; this coding portion starts at position 115 and ends at position1821. The transcript also has the following SNPs as listed in Table 919(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinHUMCEA_PEA_(—)1_P14 (SEQ ID NO:1382) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 919 Nucleic acid SNPs SNP position on nucleotide AlternativePreviously sequence nucleic acid known SNP? 49 T -> No 273 A -> C Yes303 T -> G No 324 T -> C Yes 352 A -> G Yes 362 T -> C Yes 524 A -> CYes 631 G -> A No 915 A -> G No 980 T -> C No 1133 C -> A Yes 1306 G ->A Yes

Variant protein HUMCEA_PEA_(—)1_P19 (SEQ ID NO:1383) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) HUMCEA_PEA_(—)1_T25 (SEQ IDNO:112). An alignment is given to the known protein (Carcinoembryonicantigen-related cell adhesion molecule 5 precursor (SEQ ID NO:1451)) atthe end of the application. One or more alignments to one or morepreviously published protein sequences are given at the end of theapplication. A brief description of the relationship of the variantprotein according to the present invention to each such aligned proteinis as follows:

Comparison report between HUMCEA_PEA_(—)1_P19 (SEQ ID NO:1383) andCEA5_HUMAN (SEQ ID NO:1451):

1. An isolated chimeric polypeptide encoding for HUMCEA_PEA_(—)1_P19(SEQ ID NO:1383), comprising a first amino acid sequence being at least90% homologous toMESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREIIYPNASLLIQNIIQNDTGFYTLHVIKSDLVNEEATGQFRVYPELPKPSISSNNSKPVEDKDAVAFTCEPETQDATYLWWVNNQSLPVSPRLQLSNGNRTLTLFNVTRNDTASYKCETQNPVSARRSDSVILN corresponding to amino acids 1-232 of CEA5_HUMAN (SEQID NO:1451), which also corresponds to amino acids 1-232 ofHUMCEA_PEA_(—)1_P19 (SEQ ID NO:1383), and a second amino acid sequencebeing at least 90% homologous toVLYGPDTPIISPPDSSYLSGANLNLSCHSASNPSPQYSWRINGIPQQHTQVLFIAKITPNNNGTYACFVSNLATGRNNSIVKSITVSASGTSPGLSAGATVGIMIGVLVGVALI corresponding to amino acids589-702 of CEA5_HUMAN (SEQ ID NO:1451), which also corresponds to aminoacids 233-346 of HUMCEA_PEA_(—)1_P19 (SEQ ID NO:1383), wherein saidfirst amino acid sequence and second amino acid sequence are contiguousand in a sequential order.

2. An isolated chimeric polypeptide encoding for an edge portion ofHUMCEA_PEA_(—)1_P19 (SEQ ID NO:1383), comprising a polypeptide having alength “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise NV, having a structureas follows: a sequence starting from any of amino acid numbers 232-x to232; and ending at any of amino acid numbers 233+((n−2)−x), in which xvaries from 0 to n−2.

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:membrane. The protein localization is believed to be membrane because ofmanual inspection of known protein localization and/or gene structure.

Variant protein HUMCEA_PEA_(—)1_P19 (SEQ ID NO:1383) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 920, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein HUMCEA_PEA_(—)1_P19 (SEQ ID NO:1383) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 920 Amino acid mutations SNP position(s) on Alternative Previouslyamino acid sequence amino acid(s) known SNP? 63 F -> L No 80 I -> V Yes83 V -> A Yes 137 Q -> P Yes 173 D -> N No 291 P -> No 308 R -> S Yes326 G -> No

The glycosylation sites of variant protein HUMCEA_PEA_(—)1_P19 (SEQ IDNO:1383), as compared to the known protein Carcinoembryonicantigen-related cell adhesion molecule 5 precursor (SEQ ID NO:1451), aredescribed in Table 921 (given according to their position(s) on theamino acid sequence in the first column; the second column indicateswhether the glycosylation site is present in the variant protein; andthe last column indicates whether the position is different on thevariant protein).

TABLE 921 Glycosylation site(s) Position(s) on known Present in Positionin amino acid sequence variant protein? variant protein? 197 yes 197 466no 360 no 288 no 665 yes 309 560 no 650 yes 294 480 no 104 yes 104 580no 204 yes 204 115 yes 115 208 yes 208 152 yes 152 309 no 432 no 351 no246 no 182 yes 182 612 yes 256 256 no 508 no 330 no 274 no 292 no 553 no529 no 375 no

Variant protein HUMCEA_PEA_(—)1_P19 (SEQ ID NO:1383) is encoded by thefollowing transcript(s): HUMCEA_PEA_(—)1_T25 (SEQ ID NO:112), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript HUMCEA_PEA_(—)1_T25 (SEQ ID NO:112) is shown inbold; this coding portion starts at position 115 and ends at position1152. The transcript also has the following SNPs as listed in Table 922(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinHUMCEA_PEA_(—)1_P19 (SEQ ID NO:1383) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 922 Nucleic acid SNPs SNP position on nucleotide AlternativePreviously sequence nucleic acid known SNP? 49 T -> No 273 A -> C Yes303 T -> G No 324 T -> C Yes 352 A -> G Yes 362 T -> C Yes 524 A -> CYes 631 G -> A No 840 T -> C No 894 C -> T No 987 A -> No 1036 C -> AYes 1090 G -> No 1155 G -> No 1171 T -> A No 1172 T -> A No 1364 G -> AYes 1417 C -> T No 1928 A -> G Yes 2094 C -> T Yes 2285 C -> T Yes

Variant protein HUMCEA_PEA_(—)1_P20 (SEQ ID NO:1384) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) HUMCEA_PEA_(—)1_T26 (SEQ IDNO:113). An alignment is given to the known protein (Carcinoembryonicantigen-related cell adhesion molecule 5 precursor (SEQ ID NO:1451)) atthe end of the application. One or more alignments to one or morepreviously published protein sequences are given at the end of theapplication. A brief description of the relationship of the variantprotein according to the present invention to each such aligned proteinis as follows:

Comparison report between HUMCEA_PEA_(—)1_P20 (SEQ ID NO:1384) andCEA5_HUMAN (SEQ ID NO:1451):

1. An isolated chimeric polypeptide encoding for HUMCEA_PEA_(—)1_P20(SEQ ID NO:1384), comprising a first amino acid sequence being at least90% homologous toMESPSAPPHRWCIPWQRLLLTASLLTFWNPPTTAKLTIESTPFNVAEGKEVLLLVHNLPQHLFGYSWYKGERVDGNRQIIGYVIGTQQATPGPAYSGREHYPNASLLIQNHQNDTGFYTLHVIKSDLVNEEATGQFRVYPcorresponding to amino acids 1-142 of CEA5_HUMAN (SEQ ID NO:1451), whichalso corresponds to amino acids 1-142 of HUMCEA_PEA_(—)1_P20 (SEQ IDNO:1384), and a second amino acid sequence being at least 90% homologoustoELPKPSISSNNSKPVEDKDAVAFTCEPEAQNTTYLWWVNGQSLPVSPRLQLSNGNRTLTLFNVTRNDARAYVCGIQNSVSANRSDPVTLDVLYGPDTPIISPPDSSYLSGANLNLSCHSASNPSPQYSWRINGIPQQHTQVLFIAKITPNNNGTYACFVSNLATGRNNSIVKSITVSASGTSPGLSAGATVGIMIGVLVGVALIcorresponding to amino acids 499-702 of CEA5_HUMAN (SEQ ID NO:1451),which also corresponds to amino acids 143-346 of HUMCEA_PEA_(—)1_P20(SEQ ID NO:1384), wherein said first amino acid sequence and secondamino acid sequence are contiguous and in a sequential order.

2. An isolated chimeric polypeptide encoding for an edge portion ofHUMCEA_PEA_(—)1_P20 (SEQ ID NO:1384), comprising a polypeptide having alength “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise PE, having a structureas follows: a sequence starting from any of amino acid numbers 142−x to142; and ending at any of amino acid numbers 143+((n−2)−x), in which xvaries from 0 to n−2.

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:membrane. The protein localization is believed to be membrane because ofmanual inspection of known protein localization and/or gene structure.

Variant protein HUMCEA_PEA_(—)1_P20 (SEQ ID NO:1384) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 923, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein HUMCEA_PEA_(—)1_P20 (SEQ ID NO:1384) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 923 Amino acid mutations SNP position(s) on Alternative Previouslyamino acid sequence amino acid(s) known SNP? 63 F -> L No 80 I -> V Yes83 V -> A Yes 137 Q -> P Yes 291 P -> No 308 R -> S Yes 326 G -> No

The glycosylation sites of variant protein HUMCEA_PEA_(—)1_P20 (SEQ IDNO:1384), as compared to the known protein Carcinoembryonicantigen-related cell adhesion molecule 5 precursor (SEQ ID NO:1451), aredescribed in Table 924 (given according to their position(s) on theamino acid sequence in the first column; the second column indicateswhether the glycosylation site is present in the variant protein; andthe last column indicates whether the position is different on thevariant protein).

TABLE 924 Glycosylation site(s) Position(s) on known Present in Positionin amino acid sequence variant protein? variant protein? 197 no 466 no360 no 288 no 665 yes 309 560 yes 204 650 yes 294 480 no 104 yes 104 580yes 224 204 no 115 yes 115 208 no 152 no 309 no 432 no 351 no 246 no 182no 612 yes 256 256 no 508 yes 152 330 no 274 no 292 no 553 yes 197 529yes 173 375 no

Variant protein HUMCEA_PEA_(—)1_P20 (SEQ ID NO:1384) is encoded by thefollowing transcript(s): HUMCEA_PEA_(—)1_T26 (SEQ ID NO:113), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript HUMCEA_PEA_(—)1_T26 (SEQ ID NO:113) is shown inbold; this coding portion starts at position 115 and ends at position1152. The transcript also has the following SNPs as listed in Table 925(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinHUMCEA_PEA_(—)1_P20 (SEQ ID NO:1384) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 925 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 49 T -> No 273 A -> C Yes303 T -> G No 324 T -> C Yes 352 A -> G Yes 362 T -> C Yes 524 A -> CYes 840 T -> C No 894 C -> T No 987 A -> No 1036 C -> A Yes 1090 G -> No1155 G -> No 1171 T -> A No 1172 T -> A No 1364 G -> A Yes 1417 C -> TNo 1928 A -> G Yes 2094 C -> T Yes 2285 C -> T Yes

As noted above, cluster HUMCEA features 42 segment(s), which were listedin Table 906 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster HUMCEA_PEA_(—)1_node_(—)0 (SEQ ID NO:814) according tothe present invention is supported by 56 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110), HUMCEA_PEA_(—)1_T20 (SEQ IDNO:111), HUMCEA_PEA_(—)1_T25 (SEQ ID NO:112) and HUMCEA_PEA_(—)1_T26(SEQ ID NO:113). Table 926 below describes the starting and endingposition of this segment on each transcript.

TABLE 926 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 1 178 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 1 178 HUMCEA_PEA_1_T20 (SEQID NO: 111) 1 178 HUMCEA_PEA_1_T25 (SEQ ID NO: 112) 1 178HUMCEA_PEA_1_T26 (SEQ ID NO: 113) 1 178

Segment cluster HUMCEA_PEA_(—)1_node_(—)2 (SEQ ID NO:815) according tothe present invention is supported by 83 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110), HUMCEA_PEA_(—)1_T20 (SEQ IDNO:111), HUMCEA_PEA_(—)1_T25 (SEQ ID NO:112) and HUMCEA_PEA_(—)1_T26(SEQ ID NO:113). Table 927 below describes the starting and endingposition of this segment on each transcript.

TABLE 927 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 179 456 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 179 456 HUMCEA_PEA_1_T20(SEQ ID NO: 111) 179 456 HUMCEA_PEA_1_T25 (SEQ ID NO: 112) 179 456HUMCEA_PEA_1_T26 (SEQ ID NO: 113) 179 456

Segment cluster HUMCEA_PEA_(—)1_node_(—)11 (SEQ ID NO:816) according tothe present invention is supported by 6 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109). Table 928 below describes the starting and ending position ofthis segment on each transcript.

TABLE 928 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 818 1217

Microarray (chip) data is also available for this segment as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotides were found to hit this segment (in relation to lungcancer), shown in Table 929.

TABLE 929 Oligonucleotides related to this segment Oligonucleotide nameOverexpressed in cancers Chip reference HUMCEA_0_0_96 lung malignanttumors LUN (SEQ ID NO: 240)

Segment cluster HUMCEA_PEA_(—)1_node_(—)12 (SEQ ID NO:817) according tothe present invention is supported by 83 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110) and HUMCEA_PEA_(—)1_T20 (SEQID NO:111). Table 930 below describes the starting and ending positionof this segment on each transcript.

TABLE 930 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 1218 1472 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 818 1072HUMCEA_PEA_1_T20 (SEQ ID NO: 111) 818 1072

Segment cluster HUMCEA_PEA_(—)1_node_(—)31 (SEQ ID NO:818) according tothe present invention is supported by 87 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110) and HUMCEA_PEA_(—)1_T20 (SEQID NO:111). Table 931 below describes the starting and ending positionof this segment on each transcript.

TABLE 931 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 1817 2006 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 1417 1606HUMCEA_PEA_1_T20 (SEQ ID NO: 111) 1417 1606

Segment cluster HUMCEA_PEA_(—)1_node_(—)36 (SEQ ID NO:819) according tothe present invention is supported by 94 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110) and HUMCEA_PEA_(—)1_T26 (SEQID NO:113). Table 932 below describes the starting and ending positionof this segment on each transcript.

TABLE 932 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 2159 2285 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 1759 1885HUMCEA_PEA_1_T26 (SEQ ID NO: 113) 691 817

Segment cluster HUMCEA_PEA_(—)1_node_(—)44 (SEQ ID NO:820) according tothe present invention is supported by 112 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110), HUMCEA_PEA_(—)1_T25 (SEQ IDNO:112) and HUMCEA_PEA_(—)1_T26 (SEQ ID NO:113). Table 933 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 933 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 2286 2540 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 1886 2140HUMCEA_PEA_1_T25 (SEQ ID NO: 112) 818 1072 HUMCEA_PEA_1_T26 (SEQ ID NO:113) 818 1072

Segment cluster HUMCEA_PEA_(—)1_node_(—)46 (SEQ ID NO:821) according tothe present invention is supported by 15 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T9 (SEQ IDNO:110). Table 934 below describes the starting and ending position ofthis segment on each transcript.

TABLE 934 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T9 (SEQ ID NO:110) 2174 3347

Segment cluster HUMCEA_PEA_(—)1_node_(—)63 (SEQ ID NO:822) according tothe present invention is supported by 68 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T25 (SEQ ID NO:112) and HUMCEA_PEA_(—)1_T26(SEQ ID NO:113). Table 935 below describes the starting and endingposition of this segment on each transcript.

TABLE 935 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 2957 3135 HUMCEA_PEA_1_T25 (SEQ ID NO: 112) 1489 1667HUMCEA_PEA_1_T26 (SEQ ID NO: 113) 1489 1667

Segment cluster HUMCEA_PEA_(—)1_node_(—)65 (SEQ ID NO:823) according tothe present invention is supported by 54 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T25 (SEQ ID NO:112) and HUMCEA_PEA_(—)1_T26(SEQ ID NO:113). Table 936 below describes the starting and endingposition of this segment on each transcript.

TABLE 936 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 3166 3897 HUMCEA_PEA_1_T25 (SEQ ID NO: 112) 1698 2429HUMCEA_PEA_1_T26 (SEQ ID NO: 113) 1698 2429

Segment cluster HUMCEA_PEA_(—)1_node_(—)67 (SEQ ID NO:824) according tothe present invention is supported by 2 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T20 (SEQ IDNO:111). Table 937 below describes the starting and ending position ofthis segment on each transcript.

TABLE 937 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T20 (SEQ ID NO:111) 1607 1886

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster HUMCEA_PEA_(—)1_node_(—)3 (SEQ ID NO:825) according tothe present invention is supported by 67 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110), HUMCEA_PEA_(—)1_T20 (SEQ IDNO:111), HUMCEA_PEA_(—)1_T25 (SEQ ID NO:112) and HUMCEA_PEA_(—)1_T26(SEQ ID NO:113). Table 938 below describes the starting and endingposition of this segment on each transcript.

TABLE 938 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 457 538 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 457 538 HUMCEA_PEA_1_T20(SEQ ID NO: 111) 457 538 HUMCEA_PEA_1_T25 (SEQ ID NO: 112) 457 538HUMCEA_PEA_1_T26 (SEQ ID NO: 113) 457 538

Segment cluster HUMCEA_PEA_(—)1_node_(—)7 (SEQ ID NO:826) according tothe present invention is supported by 73 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110), HUMCEA_PEA_(—)1_T20 (SEQ IDNO:111) and HUMCEA_PEA_(—)1_T25 (SEQ ID NO:112). Table 939 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 939 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 539 642 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 539 642 HUMCEA_PEA_1_T20(SEQ ID NO: 111) 539 642 HUMCEA_PEA_1_T25 (SEQ ID NO: 112) 539 642

Segment cluster HUMCEA_PEA_(—)1_node_(—)8 (SEQ ID NO:827) according tome present invention is supported by 67 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110), HUMCEA_PEA_(—)1_T20 (SEQ IDNO:111) and HUMCEA_PEA_(—)1_T25 (SEQ ID NO:112). Table 940 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 940 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 643 690 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 643 690 HUMCEA_PEA_1_T20(SEQ ID NO: 111) 643 690 HUMCEA_PEA_1_T25 (SEQ ID NO: 112) 643 690

Segment cluster HUMCEA_PEA_(—)1_node_(—)9 (SEQ ID NO:828) according tothe present invention is supported by 71 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110), HUMCEA_PEA_(—)1_T20 (SEQ IDNO:111) and HUMCEA_PEA_(—)1_T25 (SEQ ID NO:112). Table 941 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 941 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 691 738 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 691 738 HUMCEA_PEA_1_T20(SEQ ID NO: 111) 691 738 HUMCEA_PEA_1_T25 (SEQ ID NO: 112) 691 738

Segment cluster HUMCEA_PEA_(—)1_node_(—)10 (SEQ ID NO:829) according tothe present invention is supported by 67 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110), HUMCEA_PEA_(—)1_T20 (SEQ IDNO:111) and HUMCEA_PEA_(—)1_T25 (SEQ ID NO:112). Table 942 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 942 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 739 817 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 739 817 HUMCEA_PEA_1_T20(SEQ ID NO: 111) 739 817 HUMCEA_PEA_1_T25 (SEQ ID NO: 112) 739 817

Segment cluster HUMCEA_PEA_(—)1_node_(—)15 (SEQ ID NO:830) according tothe present invention can be found in the following transcript(s):HUMCEA_PEA_(—)1_T8 (SEQ ID NO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110)and HUMCEA_PEA_(—)1_T20 (SEQ ID NO:111). Table 943 below describes thestarting and ending position of this segment on each transcript.

TABLE 943 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 1473 1475 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 1073 1075HUMCEA_PEA_1_T20 (SEQ ID NO: 111) 1073 1075

Segment cluster HUMCEA_PEA_(—)1_node_(—)16 (SEQ ID NO:831) according tothe present invention can be found in the following transcript(s):HUMCEA_PEA_(—)1_T8 (SEQ ID NO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110)and HUMCEA_PEA_(—)1_T20 (SEQ ID NO:111). Table 944 below describes thestarting and ending position of this segment on each transcript.

TABLE 944 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 1476 1481 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 1076 1081HUMCEA_PEA_1_T20 (SEQ ID NO: 111) 1076 1081

Segment cluster HUMCEA_PEA_(—)1_node_(—)17 (SEQ ID NO:832) according tothe present invention can be found in the following transcript(s):HUMCEA_PEA_(—)1_T8 (SEQ ID NO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110)and HUMCEA_PEA_(—)1_T20 (SEQ ID NO:111). Table 945 below describes thestarting and ending position of this segment on each transcript.

TABLE 945 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 1482 1488 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 1082 1088HUMCEA_PEA_1_T20 (SEQ ID NO: 111) 1082 1088

Segment cluster HUMCEA_PEA_(—)1_node_(—)18 (SEQ ID NO:833) according tothe present invention can be found in the following transcript(s):HUMCEA_PEA_(—)1_T8 (SEQ ID NO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110)and HUMCEA_PEA_(—)1_T20 (SEQ ID NO:111). Table 946 below describes thestarting and ending position of this segment on each transcript.

TABLE 946 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 1489 1506 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 1089 1106HUMCEA_PEA_1_T20 (SEQ ID NO: 111) 1089 1106

Segment cluster HUMCEA_PEA_(—)1_node_(—)19 (SEQ ID NO:834) according tothe present invention is supported by 69 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110) and HUMCEA_PEA_(—)1_T20 (SEQID NO:111). Table 947 below describes the starting and ending positionof this segment on each transcript.

TABLE 947 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 1507 1576 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 1107 1176HUMCEA_PEA_1_T20 (SEQ ID NO: 111) 1107 1176

Segment cluster HUMCEA_PEA_(—)1_node_(—)20 (SEQ ID NO:835) according tothe present invention can be found in the following transcript(s):HUMCEA_PEA_(—)1_T8 (SEQ ID NO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110)and HUMCEA_PEA_(—)1_T20 (SEQ ID NO:111). Table 948 below describes thestarting and ending position of this segment on each transcript.

TABLE 948 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 1577 1600 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 1177 1200HUMCEA_PEA_1_T20 (SEQ ID NO: 111) 1177 1200

Segment cluster HUMCEA_PEA_(—)1_node_(—)21 (SEQ ID NO:836) according tothe present invention can be found in the following transcript(s):HUMCEA_PEA_(—)1_T8 (SEQ ID NO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110)and HUMCEA_PEA_(—)1_T20 (SEQ ID NO:111). Table 949 below describes thestarting and ending position of this segment on each transcript.

TABLE 949 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 1601 1624 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 1201 1224HUMCEA_PEA_1_T20 (SEQ ID NO: 111) 1201 1224

Segment cluster HUMCEA_PEA_(—)1_node_(—)22 (SEQ ID NO:837) according tothe present invention is supported by 77 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110) and HUMCEA_PEA_(—)1_T20 (SEQID NO:111). Table 950 below describes the starting and ending positionof this segment on each transcript.

TABLE 950 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 1625 1702 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 1225 1302HUMCEA_PEA_1_T20 (SEQ ID NO: 111) 1225 1302

Segment cluster HUMCEA_PEA_(—)1_node_(—)23 (SEQ ID NO:838) according tothe present invention is supported by 72 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110) and HUMCEA_PEA_(—)1_T20 (SEQID NO:111). Table 951 below describes the starting and ending positionof this segment on each transcript.

TABLE 951 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 1703 1732 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 1303 1332HUMCEA_PEA_1_T20 (SEQ ID NO: 111) 1303 1332

Segment cluster HUMCEA_PEA_(—)1_node_(—)24 (SEQ ID NO:839) according tothe present invention can be found in the following transcript(s):HUMCEA_PEA_(—)1_T8 (SEQ ID NO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110)and HUMCEA_PEA_(—)1_T20 (SEQ ID NO:111). Table 952 below describes thestarting and ending position of this segment on each transcript.

TABLE 952 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 1733 1751 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 1333 1351HUMCEA_PEA_1_T20 (SEQ ID NO: 111) 1333 1351

Segment cluster HUMCEA_PEA_(—)1_node_(—)27 (SEQ ID NO:840) according tothe present invention can be found in the following transcript(s):HUMCEA_PEA_(—)1_T8 (SEQ ID NO:109), HUMCEA_PEA_T9 (SEQ ID NO:110) andHUMCEA_PEA_(—)1_T20 (SEQ ID NO:111). Table 953 below describes thestarting and ending position of this segment on each transcript.

TABLE 953 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 1752 1770 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 1352 1370HUMCEA_PEA_1_T20 (SEQ ID NO: 111) 1352 1370

Segment cluster HUMCEA_PEA_(—)1_node_(—)29 (SEQ ID NO:841) according tothe present invention can be found in the following transcript(s):HUMCEA_PEA_(—)1_T8 (SEQ ID NO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110)and HUMCEA_PEA_(—)1_T20 (SEQ ID NO:111). Table 954 below describes thestarting and ending position of this segment on each transcript.

TABLE 954 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 1771 1788 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 1371 1388HUMCEA_PEA_1_T20 (SEQ ID NO: 111) 1371 1388

Segment cluster HUMCEA_PEA_(—)1_node_(—)30 (SEQ ID NO:842) according tothe present invention is supported by 67 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110) and HUMCEA_PEA_(—)1_T20 (SEQID NO:111). Table 955 below describes the starting and ending positionof this segment on each transcript.

TABLE 955 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 1789 1816 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 1389 1416HUMCEA_PEA_1_T20 (SEQ ID NO: 111) 1389 1416

Segment cluster HUMCEA_PEA_(—)1_node_(—)33 (SEQ ID NO:843) according tothe present invention can be found in the following transcript(s):HUMCEA_PEA_(—)1_T8 (SEQ ID NO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110)and HUMCEA_PEA_(—)1_T26 (SEQ ID NO:113). Table 956 below describes thestarting and ending position of this segment on each transcript.

TABLE 956 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 2007 2028 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 1607 1628HUMCEA_PEA_1_T26 (SEQ ID NO: 113) 539 560

Segment cluster HUMCEA_PEA_(—)1_node_(—)34 (SEQ ID NO:844) according tothe present invention is supported by 80 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110) and HUMCEA_PEA_(—)1_T26 (SEQID NO:113). Table 957 below describes the starting and ending positionof this segment on each transcript.

TABLE 957 Segment location on transcripts Segment Segment starting ending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO: 109)2029 2110 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 1629 1710 HUMCEA_PEA_1_T26(SEQ ID NO: 113) 561 642

Segment cluster HUMCEA_PEA_(—)1_node_(—)35 (SEQ ID NO:845) according tothe present invention is supported by 75 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T9 (SEQ ID NO:110) and HUMCEA_PEA_(—)1_T26 (SEQID NO:113). Table 958 below describes the starting and ending positionof this segment on each transcript.

TABLE 958 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 2111 2158 HUMCEA_PEA_1_T9 (SEQ ID NO: 110) 1711 1758HUMCEA_PEA_1_T26 (SEQ ID NO: 113) 643 690

Segment cluster HUMCEA_PEA_(—)1_node_(—)45 (SEQ ID NO:846) according tothe present invention is supported by 9 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T9 (SEQ IDNO:110). Table 959 below describes the starting and ending position ofthis segment on each transcript.

TABLE 959 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T9 (SEQ ID NO:110) 2141 2173

Segment cluster HUMCEA_PEA_(—)1_node_(—)50 (SEQ ID NO:847) according tothe present invention is supported by 64 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T25 (SEQ ID NO:112) and HUMCEA_PEA_(—)1_T26(SEQ ID NO:113). Table 960 below describes the starting and endingposition of this segment on each transcript.

TABLE 960 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 2541 2567 HUMCEA_PEA_1_T25 (SEQ ID NO: 112) 1073 1099HUMCEA_PEA_1_T26 (SEQ ID NO: 113) 1073 1099

Segment cluster HUMCEA_PEA_(—)1_node_(—)51 (SEQ ID NO:848) according tothe present invention is supported by 88 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T25 (SEQ ID NO:112) and HUMCEA_PEA_(—)1_T26(SEQ ID NO:113). Table 961 below describes the starting and endingposition of this segment on each transcript.

TABLE 961 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 2568 2659 HUMCEA_PEA_1_T25 (SEQ ID NO: 112) 1100 1191HUMCEA_PEA_1_T26 (SEQ ID NO: 113) 1100 1191

Segment cluster HUMCEA_PEA_(—)1_node_(—)56 (SEQ ID NO:849) according tothe present invention supported by 75 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ ID NO:109),HUMCEA_PEA_(—)1_T25 (SEQ ID NO:112) and HUMCEA_PEA_(—)1_T26 (SEQ IDNO:113). Table 962 below describes the starting and ending position ofthis segment on each transcript.

TABLE 962 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 2660 2685 HUMCEA_PEA_1_T25 (SEQ ID NO: 112) 1192 1217HUMCEA_PEA_1_T26 (SEQ ID NO: 113) 1192 1217

Segment cluster HUMCEA_PEA_(—)1_node_(—)57 (SEQ ID NO:850) according tothe present invention is supported by 82 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T25 (SEQ ID NO:112) and HUMCEA_PEA_(—)1_T26(SEQ ID NO:113). Table 963 below describes the starting and endingposition of this segment on each transcript.

TABLE 963 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 2686 2786 HUMCEA_PEA_1_T25 (SEQ ID NO: 112) 1218 1318HUMCEA_PEA_1_T26 (SEQ ID NO: 113) 1218 1318

Segment cluster HUMCEA_PEA_(—)1_node_(—)58 (SEQ ID NO:851) according tothe present invention is supported by 63 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T25 (SEQ ID NO:112) and HUMCEA_PEA_(—)1_T26(SEQ ID NO:113). Table 964 below describes the starting and endingposition of this segment on each transcript.

TABLE 964 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 2787 2820 HUMCEA_PEA_1_T25 (SEQ ID NO: 112) 1319 1352HUMCEA_PEA_1_T26 (SEQ ID NO: 113) 1319 1352

Segment cluster HUMCEA_PEA_(—)1_node_(—)60 (SEQ ID NO:852) according tothe present invention is supported by 55 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1 T25 (SEQ ID NO:112) and HUMCEA_PEA_(—)1_T26(SEQ ID NO:113). Table 965 below describes the starting and endingposition of this segment on each transcript.

TABLE 965 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 2821 2864 HUMCEA_PEA_1_T25 (SEQ ID NO: 112) 1353 1396HUMCEA_PEA_1_T26 (SEQ ID NO: 113) 1353 1396

Segment cluster HUMCEA_PEA_(—)1_node_(—)61 (SEQ ID NO:853) according tothe present invention can be found in the following transcript(s):HUMCEA_PEA_(—)1_T8 (SEQ ID NO:109), HUMCEA_PEA_(—)1_T25 (SEQ ID NO:112)and HUMCEA_PEA_(—)1_T26 (SEQ ID NO:113). Table 966 below describes thestarting and ending position of this segment on each transcript.

TABLE 966 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 2865 2868 HUMCEA_PEA_1_T25 (SEQ ID NO: 112) 1397 1400HUMCEA_PEA_1_T26 (SEQ ID NO: 113) 1397 1400

Segment cluster HUMCEA_PEA_(—)1_node_(—)62 (SEQ ID NO:854) according tothe present invention is supported by 60 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T25 (SEQ ID NO:112) and HUMCEA_PEA_(—)1_T26(SEQ ID NO:113). Table 967 below describes the starting and endingposition of this segment on each transcript.

TABLE 967 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 2869 2956 HUMCEA_PEA_1_T25 (SEQ ID NO: 112) 1401 1488HUMCEA_PEA_1_T26 (SEQ ID NO: 113) 1401 1488

Segment cluster HUMCEA_PEA_(—)1_node_(—)64 (SEQ ID NO:855) according tothe present invention is supported by 45 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMCEA_PEA_(—)1_T8 (SEQ IDNO:109), HUMCEA_PEA_(—)1_T25 (SEQ ID NO:112) and HUMCEA_PEA_(—)1_T26(SEQ ID NO:113). Table 968 below describes the starting and endingposition of this segment on each transcript.

TABLE 968 Segment location on transcripts Segment Segment startingending Transcript name position position HUMCEA_PEA_1_T8 (SEQ ID NO:109) 3136 3165 HUMCEA_PEA_1_T25 (SEQ ID NO: 112) 1668 1697HUMCEA_PEA_1_T26 (SEQ ID NO: 113) 1668 1697

Variant Protein Alignment to the Previously Known Protein:

Sequence name: CEA5_HUMAN (SEQ ID NO: 1451) Sequence documentation:Alignment of: HUMCEA_PEA_1_P4 (SEQ ID NO: 1380) × CEA5_HUMAN (SEQ ID NO:1451) Alignment segment 1/1: Quality: 2320.00 Escore: 0 Matching length:234 Total length: 234 Matching Percent Similarity: 100.00 MatchingPercent Identity: 100.00 Total Percent Similarity: 100.00 Total PercentIdentity: 100.00 Gaps: 0 Alignment:

Sequence name: CEA5_HUMAN (SEQ ID NO: 1451) Sequence documentation:Alignment of: HUMCEA_PEA_1_P5 (SEQ ID NO: 1381) × CEA5_HUMAN (SEQ ID NO:1451) Alignment segment 1/1: Quality: 6692.00 Escore: 0 Matching length:675 Total length: 675 Matching Percent Similarity: 100.00 MatchingPercent Identity: 100.00 Total Percent Similarity: 100.00 Total PercentIdentity: 100.00 Gaps: 0 Alignment:

Sequence name: CEA5_HUMAN (SEQ ID NO: 1451) Sequence documentation:Alignment of: HUMCEA_PEA_1_P19 (SEQ ID NO: 1383) × CEA5_HUMAN (SEQ IDNO: 1451) . . . Alignment segment 1/1: Quality: 3298.00 Escore: 0Matching length: 346 Total length: 702 Matching Percent Similarity:100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 49.29Total Percent Identity: 49.29 Gaps: 1 Alignment:

Sequence name: CEA5_HUMAN (SEQ ID NO: 1451) Sequence documentation:Alignment of: HUMCEA_PEA_1_P20 (SEQ ID NO: 1384) × CEA5_HUMAN (SEQ IDNO: 1451) . . . Alignment segment 1/1: Quality: 3294.00 Escore: 0Matching length: 346 Total length: 702 Matching Percent Similarity:100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 49.29Total Percent Identity: 49.29 Gaps: 1 Alignment:

Description for Cluster R35137

Cluster R35137 features 6 transcript(s) and 20 segment(s) of interest,the names for which are given in Tables 969 and 970, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 971.

TABLE 969 Transcripts of interest Transcript Name Sequence ID No.R35137_PEA_1_PEA_1_PEA_1_T3 114 R35137_PEA_1_PEA_1_PEA_1_T5 115R35137_PEA_1_PEA_1_PEA_1_T10 116 R35137_PEA_1_PEA_1_PEA_1_T11 117R35137_PEA_1_PEA_1_PEA_1_T12 118 R35137_PEA_1_PEA_1_PEA_1_T14 119

TABLE 970 Segments of interest Segment Name Sequence ID No.R35137_PEA_1_PEA_1_PEA_1_node_2 856 R35137_PEA_1_PEA_1_PEA_1_node_3 857R35137_PEA_1_PEA_1_PEA_1_node_9 858 R35137_PEA_1_PEA_1_PEA_1_node_11 859R35137_PEA_1_PEA_1_PEA_1_node_16 860 R35137_PEA_1_PEA_1_PEA_1_node_18861 R35137_PEA_1_PEA_1_PEA_1_node_20 862R35137_PEA_1_PEA_1_PEA_1_node_27 863 R35137_PEA_1_PEA_1_PEA_1_node_5 864R35137_PEA_1_PEA_1_PEA_1_node_7 865 R35137_PEA_1_PEA_1_PEA_1_node_12 866R35137_PEA_1_PEA_1_PEA_1_node_14 867 R35137_PEA_1_PEA_1_PEA_1_node_15868 R35137_PEA_1_PEA_1_PEA_1_node_17 869R35137_PEA_1_PEA_1_PEA_1_node_21 870 R35137_PEA_1_PEA_1_PEA_1_node_22871 R35137_PEA_1_PEA_1_PEA_1_node_23 872R35137_PEA_1_PEA_1_PEA_1_node_24 873 R35137_PEA_1_PEA_1_PEA_1_node_25874 R35137_PEA_1_PEA_1_PEA_1_node_26 875

TABLE 971 Proteins of interest Sequence ID Protein Name No.Corresponding Transcript(s) R35137_PEA_1_PEA_1_PEA_1_P9 1385R35137_PEA_1_PEA_1_PEA_1_T10 (SEQ ID NO: 116);R35137_PEA_1_PEA_1_PEA_1_T12 (SEQ ID NO: 118)R35137_PEA_1_PEA_1_PEA_1_P8 1386 R35137_PEA_1_PEA_1_PEA_1_T11 (SEQ IDNO: 117) R35137_PEA_1_PEA_1_PEA_1_P11 1387 R35137_PEA_1_PEA_1_PEA_1_T14(SEQ ID NO: 119) R35137_PEA_1_PEA_1_PEA_1_P2 1388R35137_PEA_1_PEA_1_PEA_1_T3 (SEQ ID NO: 114) R35137_PEA_1_PEA_1_PEA_1_P41389 R35137_PEA_1_PEA_1_PEA_1_T5 (SEQ ID NO: 115)

These sequences are variants of the known protein Alanineaminotransferase (SwissProt accession identifier ALAT_HUMAN; known alsoaccording to the synonyms EC 2.6.1.2; Glutamic-pyruvic transaminase;GPT; Glutamic-alanine transaminase), SEQ ID NO: 1452, referred to hereinas the previously known protein.

Protein Alanine aminotransferase (SEQ ID NO:1452) is known or believedto have the following function(s): Participates in cellular nitrogenmetabolism and also in liver gluconeogenesis starting with precursorstransported from skeletal muscles. The sequence for protein Alanineaminotransferase is given at the end of the application, as “Alanineaminotransferase amino acid sequence”. Known polymorphisms for thissequence are as shown in Table 972.

TABLE 972 Amino acid mutations for Known Protein SNP position(s) onamino acid sequence Comment 13 H -> N (in allele GPT*2; dbSNP: 1063739)./FTId = VAR_000561. 3-6 STGD -> RRGN 38 G -> S 221  A -> H

Protein Alanine Aminotransferase (SEQ ID NO:1452) Localization isBelieved to be Cytoplasmic.

Cluster R35137 can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 34 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 34 and Table 973. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions:hepatocellular carcinoma.

TABLE 973 Normal tissue distribution Name of Tissue Number brain 12epithelial 16 general 8 kidney 20 liver 0 lung 0 pancreas 2 prostate 0

TABLE 974 P values and ratios for expression in cancerous tissue Name ofTissue P1 P2 SP1 R3 SP2 R4 brain 3.2e−01 4.8e−01 1.8e−01 2.5 4.2e−01 1.5epithelial 7.6e−01 7.7e−01 8.9e−01 0.5 9.8e−01 0.4 general 6.7e−018.2e−01 4.2e−01 1.0 8.5e−01 0.7 kidney 8.6e−01 9.0e−01 5.8e−01 0.97.0e−01 0.8 liver 1.8e−01 4.5e−01 3.0e−03 7.6 1.6e−01 2.3 lung 1 6.3e−011 1.0 6.2e−01 1.6 pancreas 2.3e−01 4.0e−01 1.8e−01 3.1 2.8e−01 2.3prostate 1 7.8e−01 1 1.0 7.5e−01 1.3

As noted above, cluster 835137 features 6 transcript(s), which werelisted in Table 969 above. These transcript(s) encode for protein(s)which are variant(s) of protein Alanine aminotransferase (SEQ IDNO:1452). A description of each variant protein according to the presentinvention is now provided.

Variant protein R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P9 (SEQ ID NO:1385)according to the present invention has an amino acid sequence as givenat the end of the application; it is encoded by transcript(s)R35137PEA_(—)1_PEA_(—)1_PEA_(—)1_T10 (SEQ ID NO:116). An alignment isgiven to the known protein (Alanine aminotransferase (SEQ ID NO:1452))at the end of the application. One or more alignments to one or morepreviously published protein sequences are given at the end of theapplication. A brief description of the relationship of the variantprotein according to the present invention to each such aligned proteinis as follows:

Comparison report between R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P9 (SEQ IDNO:1385) and ALAT_HUMAN_V1 (SEQ ID NO: 1453):

1. An isolated chimeric polypeptide encoding forR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P9 (SEQ ID NO:1385), comprising afirst amino acid sequence being at least 90% homologous toMASSTGDRSQAVRHGLRAKVLTLDGMNPRVRRVEYAVRGPIVQRALELEQELRQGVKKPFTEVIRANIGDAQAMGQRPITFLRQVLALCVNPDLLSSPNFPDDAKKRAERILQACGGHSLGAYSVSSGIQLIREDVARYIERRDGGIPADPNNVFLSTGASDAIVTVLKLLVAGEGHTRTGVLIPIPQYPLYSATLAELGAVQVDYYLDEERAWALDVAELHRALGQARDHCRPRALCVINPGNPTGQVQTRECIEAVIRFAFEERLFLLADEVcorresponding to amino acids 1-274 of ALAT_HUMAN_V1 (SEQ ID NO:1453),which also corresponds to amino acids 1-274 of R35137 _PEA_(—)1_PEA_(—)1_PEA_(—)1_P9 (SEQ ID NO:1385), and a second amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequenceRGAGEREAGQQSAPVTPCALPGVPGQRVRRGFAVPLIQEGAHGDGAALRRAAGACLLPLHLQGLHGRVRAYEAGGGSRAMARPSSPDGPPPPPHLTWPCAGAGSAAAMWRW (SEQ ID NO: 1737)corresponding to amino acids 275-385 ofR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P9 (SEQ ID NO:1385), wherein saidfirst amino acid sequence and second amino acid sequence are contiguousand in a sequential order.

2. An isolated polypeptide encoding for a tail ofR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P9 (SEQ ID NO:1385), comprising apolypeptide being at least 70%, optionally at least about 80%,preferably at least about 85%, more preferably at least about 90% andmost preferably at least about 95% homologous to the sequenceRGAGEREAGQQSAPVTPCALPGVPGQRVRRGFAVPLIQEGAHGDGAALRRAAGACLLPLHLQGLHGRVRAYEAGGGSRAMARPSSPDGPPPPPHLTWPCAGAGSAAAMWRW (SEQ ID NO: 1737) inR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P9 (SEQ ID NO:1385).

It should be noted that the known protein sequence (ALAI HUMAN (SEQ IDNO:1452)) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence forALAT_HUMAN_V1 (SEQ ID NO:1453). These changes were previously known tooccur and are listed in the table below.

TABLE 975 Changes to ALAT_HUMAN_V1 (SEQ ID NO: 1453) SNP position(s) onamino acid sequence Type of change 1 init_met 222 conflict

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:intracellularly. The protein localization is believed to beintracellularly because neither of the trans-membrane region predictionprograms predicted a trans-membrane region for this protein. In additionboth signal-peptide prediction programs predict that this protein is anon-secreted protein.

Variant protein R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P9 (SEQ ID NO:1385) isencoded by the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T10 (SEQ ID NO:116), for which thesequence(s) is/are given at the end of the application. The codingportion of transcript R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T10 (SEQ IDNO:116) is shown in bold; this coding portion starts at position 271 andends at position 1425. The transcript also has the following SNPs aslisted in Table 976 (given according to their position on the nucleotidesequence, with the alternative nucleic acid listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P9 (SEQ ID NO:1385)sequence provides support for the deduced sequence of this variantprotein according to the present invention).

TABLE 976 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 230 C -> T No 231 C -> T No310 C -> A Yes 432 G -> No 969 C -> No 1225 G -> No 1745 T -> G No 1957C -> No 2018 G -> A No 2019 C -> A No 2101 A -> G No 2102 A -> G No 2159C -> T Yes 2710 G -> C No 2789 C -> A Yes 3622 G -> A Yes

Variant protein R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P8 (SEQ ID NO:1386)according to the present invention has an amino acid sequence as givenat the end of the application; it is encoded by transcript(s)R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T11 (SEQ ID NO:117). An alignment isgiven to the known protein (Alanine aminotransferase (SEQ ID NO:1452))at the end of the application. One or more alignments to one or morepreviously published protein sequences are given at the end of theapplication. A brief description of the relationship of the variantprotein according to the present invention to each such aligned proteinis as follows:

Comparison report between R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P8 (SEQ IDNO:1386) and ALAT_HUMAN_V1 (SEQ ID NO:1453):

1. An isolated chimeric polypeptide encoding forR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P8 (SEQ ID NO:1386), comprising afirst amino acid sequence being at least 90% homologous toMASSTGDRSQAVRHGLRAKVLTLDGMNPRVRRVEYAVRGPIVQRALELEQELRQGVKKPFTEVIRANIGDAQAMGQRPITFLRQVLALCVNPDLLSSPNFPDDAKKRAERILQACGGHSLGAYSVSSGIQLIREDVARYIERRDGGIPADPNNVFLSTGASDAIVTVLKLLVAGEGHTRTGVLIPIPQYPLYSATLAELGAVQVDYYLDEERAWALDVAELHRALGQARDHCRPRALCVINPGNPTGQVQTRECIEAVIRFAFEERLFLLADEVYQDNVYAAGSQFHSFKKVLMEMGPPYAGQQELASFHSTSKGYMGEC corresponding to amino acids 1-320of ALAT_HUMAN_V1 (SEQ ID NO:1453), which also corresponds to amino acids1-320 of R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P8 (SEQ ID NO:1386), and asecond amino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceVRTRRVGARGPWPGPPRPMGHPLLRT (SEQ ID NO: 1738) corresponding to aminoacids 321-346 of R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P8 (SEQ ID NO:1386),wherein said first amino acid sequence and second amino acid sequenceare contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail ofR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P8 (SEQ ID NO:1386), comprising apolypeptide being at least 70%, optionally at least about 80%,preferably at least about 85%, more preferably at least about 90% andmost preferably at least about 95% homologous to the sequenceVRTRRVGARGPWPGPPRPMGHPLLRT (SEQ ID NO: 1738) inR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P8 (SEQ ID NO:1386).

It should be noted that the known protein sequence (ALAT_HUMAN (SEQ IDNO:1452)) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence forALAT_HUMAN_V1 (SEQ ID NO:1453). These changes were previously known tooccur and are listed in the table below.

TABLE 977 Changes to ALAT_HUMAN_V1 (SEQ ID NO: 1453) SNP position(s) onamino acid sequence Type of change 1 init_met 222 conflict

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:intracellularly. The protein localization is believed to beintracellularly because neither of the trans-membrane region predictionprograms predicted a trans-membrane region for this protein. In additionboth signal-peptide prediction programs predict that this protein is anon-secreted protein.

Variant protein R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P8 (SEQ ID NO:1386)also has the following non-silent SNPs (Single Nucleotide Polymorphisms)as listed in Table 978, (given according to their position(s) on theamino acid sequence, with the alternative amino acid(s) listed; the lastcolumn indicates whether the SNP is known or not; the presence of knownSNPs in variant protein R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P8 (SEQ IDNO:1386) sequence provides support for the deduced sequence of thisvariant protein according to the present invention).

TABLE 978 Amino acid mutations SNP position(s) on amino AlternativePreviously acid sequence amino acid(s) known SNP? 14 H -> N Yes 54 Q ->No 233 R -> No 296 M -> No

Variant protein R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P8 (SEQ ID NO:1386) isencoded by the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T11 (SEQ ID NO:117), for which thesequence(s) is/are given at the end of the application. The codingportion of transcript R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T11 (SEQ IDNO:117) is shown in bold; this coding portion starts at position 271 andends at position 1308. The transcript also has the following SNPs aslisted in Table 979 (given according to their position on the nucleotidesequence, with the alternative nucleic acid listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P8 (SEQ ID NO:1386)sequence provides support for the deduced sequence of this variantprotein according to the present invention).

TABLE 979 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 230 C -> T No 231 C -> T No310 C -> A Yes 432 G -> No 969 C -> No 1158 G -> No 1752 T -> G No 2030C -> No 2091 G -> A No 2092 C -> A No 2174 A -> G No 2175 A -> G No 2232C -> T Yes 2783 G -> C No 2862 C -> A Yes 3695 G -> A Yes

Variant protein R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P11 (SEQ ID NO:1387)according to the present invention has an amino acid sequence as givenat the end of the application; it is encoded by transcript(s)R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:119). An alignment isgiven to the known protein (Alanine aminotransferase (SEQ ID NO:1452))at the end of the application. One or more alignments to one or morepreviously published protein sequences are given at the end of theapplication. A brief description of the relationship of the variantprotein according to the present invention to each such aligned proteinis as follows:

Comparison report between R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P11 (SEQ IDNO:1387) and ALAT_HUMAN_V1 (SEQ ID NO:1453):

1. An isolated chimeric polypeptide encoding forR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P11 (SEQ ID NO:1387), comprising afirst amino acid sequence being at least 90% homologous toMASSTGDRSQAVRHGLRAKVLTLDGMNPRVRRVEYAVRGPIVQRALELEQELRQGVKKPFTEVIRANIGDAQAMGQRPITFLRQVLALCVNPDLLSSPNFPDDAKKRAERILQACGGHSLGAYSVSSGIQLIREDVARYIERRDGGIPADPNNVFLSTGASDAIVTVLKLLVAGEGHTRTGVLIPIPQYPLYSATLAELGAVQVDYYLDEERAWALDVAELHRALGQAR corresponding to amino acids 1-229 of ALAT_HUMAN_V1(SEQ ID NO:1453), which also corresponds to amino acids 1-229 ofR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P11 (SEQ ID NO:1387), and a secondamino acid sequence being at least 90% homologous toSGFGQREGTYHFRMTILPPLEKLRLLLEKLSRFHAKFTLEYS corresponding to amino acids455-496 of ALAT_HUMAN_V1 (SEQ ID NO:1453), which also corresponds toamino acids 230-271 of R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P11 (SEQ IDNO:1387), wherein said first amino acid sequence and second amino acidsequence are contiguous and in a sequential order.

2. An isolated chimeric polypeptide encoding for an edge portion ofR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P11 (SEQ ID NO:1387), comprising apolypeptide having a length “n”, wherein n is at least about 10 aminoacids in length, optionally at least about 20 amino acids in length,preferably at least about 30 amino acids in length, more preferably atleast about 40 amino acids in length and most preferably at least about50 amino acids in length, wherein at least two amino acids comprise RS,having a structure as follows: a sequence starting from any of aminoacid numbers 229−x to 229; and ending at any of amino acid numbers230+((n−2)−x), in which x varies from 0 to n−2.

It should be noted that the known protein sequence (ALAT_HUMAN (SEQ IDNO:1452)) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence forALAT_HUMAN_V1 (SEQ ID NO:1453). These changes were previously known tooccur and are listed in the table below.

TABLE 980 Changes to ALAT_HUMAN_V1 (SEQ ID NO: 1453) SNP position(s) onamino acid sequence Type of change 1 init_met 222 conflict

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:intracellularly. The protein localization is believed to beintracellularly because neither of the trans-membrane region predictionprograms predicted a trans-membrane region for this protein. In additionboth signal-peptide prediction programs predict that this protein is anon-secreted protein.

Variant protein R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P11 (SEQ ID NO:1387)also has the following non-silent SNPs (Single Nucleotide Polymorphisms)as listed in Table 981, (given according to their position(s) on theamino acid sequence, with the alternative amino acid(s) listed; the lastcolumn indicates whether the SNP is known or not; the presence of knownSNPs in variant protein R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P11 (SEQ IDNO:1387) sequence provides support for the deduced sequence of thisvariant protein according to the present invention).

TABLE 981 Amino acid mutations SNP position(s) on amino AlternativePreviously acidsequence amino acid(s) known SNP? 14 H -> N Yes 54 Q ->No

Variant protein R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P11 (SEQ ID NO:1387)is encoded by the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:119), for which thesequence(s) is/are given at the end of the application. The codingportion of transcript R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T14 (SEQ IDNO:119) is shown in bold; this coding portion starts at position 271 andends at position 1083. The transcript also has the following SNPs aslisted in Table 982 (given according to their position on the nucleotidesequence, with the alternative nucleic acid listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P11 (SEQ ID NO:1387)sequence provides support for the deduced sequence of this variantprotein according to the present invention).

TABLE 982 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 230 C -> T No 231 C -> T No310 C -> A Yes 432 G -> No 1115 C -> No 1176 G -> A No 1177 C -> A No

Variant protein R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P2 (SEQ ID NO:1388)according to the present invention has an amino acid sequence as givenat the end of the application; it is encoded by transcript(s) R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T3 (SEQ ID NO:114). An alignment is given tothe known protein (Alanine aminotransferase (SEQ ID NO:1452)) at the endof the application. One or more alignments to one or more previouslypublished protein sequences are given at the end of the application. Abrief description of the relationship of the variant protein accordingto the present invention to each such aligned protein is as follows:

Comparison report between R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P2 (SEQ IDNO:1388) and ALAT_HUMAN_V1 (SEQ ID NO:1453):

1. An isolated chimeric polypeptide encoding forR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P2 (SEQ ID NO:1388), comprising afirst amino acid sequence being at least 90% homologous toMASSTGDRSQAVRHGLRAKVLTLDGMNPRVRRVEYAVRGPIVQRALELEQELRQGVKKPFTEVIRANIGDAQAMGQRPITFLRQVLALCVNPDLLSSPNFPDDAKKRAERILQACGGHSLGAYSVSSGIQLIREDVARYIERRDGGIPADPNNVFLSTGASDAIVTVLKLLVAGEGHTRTGVLIPIPQYPLYSATLAELGAVQVDYYLDEERAWALDVAELHRALGQARDHCRPRALCVINPGNPTGQVQTRECIEAVIRFAFEERLFLLADEVcorresponding to amino acids 1-274 of ALAT_HUMAN_V1 (SEQ ID NO:1453),which also corresponds to amino acids 1-274 ofR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P2 (SEQ ID NO:1388), and a secondamino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceRGAGEREAGQQSAPVTPCALPGVPGQRVRRGFAVPLIQEGAHGDGAALRRAAGACLLPLHLQGLHGRVRVPRRLCGGGEHGRCSAAADAEADECAAVPAGARTGPAGPGGQPARAHRPLLCAVPG (SEQ ID NO:1739) corresponding to amino acids 275-399 ofR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P2 (SEQ ID NO:1388), wherein saidfirst amino acid sequence and second amino acid sequence are contiguousand in a sequential order.

2. An isolated polypeptide encoding for a tail ofR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P2 (SEQ ID NO:1388), comprising apolypeptide being at least 70%, optionally at least about 80%,preferably at least about 85%, more preferably at least about 90% andmost preferably at least about 95% homologous to the sequenceRGAGEREAGQQSAPVTPCALPGVPGQRVRRGFAVPLIQEGAHGDGAALRRAAGACLLPLHLQGLHGRVRVPRRLCGGGEHGRCSAAADAEADECAAVPAGARTGPAGPGGQPARAHRPLLCAVPG (SEQ ID NO:1739) in R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P2 (SEQ ID NO:1388).

It should be noted that the known protein sequence (ALA_HUMAN (SEQ IDNO:1452)) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence forALAT_HUMAN_V1 (SEQ ID NO:1453). These changes were previously known tooccur and are listed in the table below.

TABLE 983 Changes to ALAT_HUMAN_V1 (SEQ ID NO: 1453) SNP position(s) onamino acid sequence Type of change 1 init_met 222 conflict

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:intracellularly. The protein localization is believed to beintracellularly because neither of the trans-membrane region predictionprograms predicted a trans-membrane region for this protein. In additionboth signal-peptide prediction programs predict that this protein is anon-secreted protein.

Variant protein R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P2 (SEQ ID NO:1388)also has the following non-silent SNPs (Single Nucleotide Polymorphisms)as listed in Table 984, (given according to their position(s) on theamino acid sequence, with the alternative amino acid(s) listed; the lastcolumn indicates whether the SNP is known or not; the presence of knownSNPs in variant protein R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P2 (SEQ IDNO:1388) sequence provides support for the deduced sequence of thisvariant protein according to the present invention).

TABLE 984 Amino acid mutations SNP position(s) on amino AlternativePreviously acid sequence amino acid(s) known SNP? 14 H -> N Yes 54 Q ->No 233 R -> No 319 G -> No

Variant protein R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P2 (SEQ ID NO:1388) isencoded by the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T3 (SEQ ID NO:114), for which thesequence(s) is/are given at the end of the application. The codingportion of transcript R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T3 (SEQ IDNO:114) is shown in bold; this coding portion starts at position 271 andends at position 1467. The transcript also has the following SNPs aslisted in Table 985 (given according to their position on the nucleotidesequence, with the alternative nucleic acid listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P2 (SEQ ID NO:1388)sequence provides support for the deduced sequence of this variantprotein according to the present invention).

TABLE 985 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 230 C -> T No 231 C -> T No310 C -> A Yes 432 G -> No 969 C -> No 1225 G -> No 1645 T -> G No 1857C -> No 1918 G -> A No 1919 C -> A No 2001 A -> G No 2002 A -> G No 2059C -> T Yes 2610 G -> C No 2689 C -> A Yes 3522 G -> A Yes

Variant protein R35137 _PEA_(—)1_PEA_(—)1_PEA_(—)1_P4 (SEQ ID NO:1389)according to the present invention has an amino acid sequence as givenat the end of the application; it is encoded by transcript(s)R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T5 (SEQ ID NO:115). An alignment isgiven to the known protein (Alanine aminotransferase (SEQ ID NO:1452))at the end of the application. One or more alignments to one or morepreviously published protein sequences are given at the end of theapplication. A brief description of the relationship of the variantprotein according to the present invention to each such aligned proteinis as follows:

Comparison report between R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P4 (SEQ IDNO:1389) and ALAT_HUMAN_V1 (SEQ ID NO:1453):

1. An isolated chimeric polypeptide encoding forR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P4 (SEQ ID NO:1389), comprising afirst amino acid sequence being at least 90% homologous toMASSTGDRSQAVRHGLRAKVLTLDGMNPRVRRVEYAVRGPIVQRALELEQELRQGVKKPFTEVIRANIGDAQAMGQRPITFLRQVLALCVNPDLLSSPNFPDDAKKRAERILQACGGHSLGAYSVSSGIQLIREDVARYIERRDGGIPADPNNVFLSTGASDAIVTVLKLLVAGEGHTRTGVLIPIPQYPLYSATLAELGAVQVDYYLDEERAWALDVAELHRALGQARDHCRPRALCVINPGNPTGQVQTRECIEAVIRFAFEERLFLLADEVYQDNVYAAGSQFHSFKKVLMEMGPPYAGQQELASFHSTSKGYMGECGFRGGYVEVVNMDAAVQQQMLKLMSVRLCPPVPGQALLDLVVSPPAPTDPSFAQFQAEKQAVLAELAAKAKLTEQVFNEAPGISCNPVQGAMYSFPRVQLPPRAVERAQELGLAPDMFFCLRLLEETGICVVPGSGFGQREGTYHFRMTILPPLEKLRLLLEKLSRFHAKFTLE corresponding to amino acids 1-494 of ALAT_HUMAN_V1 (SEQ ID NO:1453),which also corresponds to amino acids 1-494 ofR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P4 (SEQ ID NO:1389), and a secondamino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceSPGRLWSPLYLLLMPGGVGWGGCWAPASLQVPNKAVWQSDSKKEALAAAWPAPTCLPFLQA (SEQ IDNO: 1740) corresponding to amino acids 495-555 ofR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P4 (SEQ ID NO:1389), wherein saidfirst amino acid sequence and second amino acid sequence are contiguousand in a sequential order.

2. An isolated polypeptide encoding for a tail ofR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P4 (SEQ ID NO:1389), comprising apolypeptide being at least 70%, optionally at least about 80%,preferably at least about 85%, more preferably at least about 90% andmost preferably at least about 95% homologous to the sequenceSPGRLWSPLYLLLMPGGVGWGGCWAPASLQVPNKAVWQSDSKKEALAAAWPAPTCLPFLQA (SEQ IDNO: 1740) in R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P4 (SEQ ID NO:1389).

It should be noted that the known protein sequence (ALAT_HUMAN (SEQ IDNO:1452)) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence forALAT_HUMAN_V1 (SEQ ID NO:1453). These changes were previously known tooccur and are listed in the table below.

TABLE 986 Changes to ALAT_HUMAN_V1 (SEQ ID NO: 1453) SNP position(s) onamino acid sequence Type of change 1 init_met 222 conflict

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:intracellularly. The protein localization is believed to beintracellularly because neither of the trans-membrane region predictionprograms predicted a trans-membrane region for this protein. In additionboth signal-peptide prediction programs predict that this protein is anon-secreted protein.

Variant protein R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P4 (SEQ ID NO:1389)also has the following non-silent SNPs (Single Nucleotide Polymorphisms)as listed in Table 987, (given according to their position(s) on theamino acid sequence, with the alternative amino acid(s) listed; the lastcolumn indicates whether the SNP is known or not; the presence of knownSNPs in variant protein R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P4 (SEQ IDNO:1389) sequence provides support for the deduced sequence of thisvariant protein according to the present invention).

TABLE 987 Amino acid mutations SNP position(s) on amino AlternativePreviously acid sequence amino acid(s) known SNP? 14 H -> N Yes 54 Q ->No 233 R -> No 296 M -> No 436 D -> E No 508 M -> I No 509 P -> T No 536K -> R No

Variant protein R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P4 (SEQ ID NO:1389) isencoded by the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T5 (SEQ ID NO:115), for which thesequence(s) is/are given at the end of the application. The codingportion of transcript R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T5 (SEQ IDNO:115) is shown in bold; this coding portion starts at position 271 andends at position 1935. The transcript also has the following SNPs aslisted in Table 988 (given according to their position on the nucleotidesequence, with the alternative nucleic acid listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_P4 (SEQ ID NO:1389)sequence provides support for the deduced sequence of this variantprotein according to the present invention).

TABLE 988 Nucleic acid SNPs SNP position on nucleotide AlternativePreviously sequence nucleic acid known SNP? 230 C -> T No 231 C -> T No310 C -> A Yes 432 G -> No 969 C -> No 1158 G -> No 1578 T -> G No 1794G -> A No 1795 C -> A No 1877 A -> G No 1878 A -> G No 1935 C -> T Yes2486 G -> C No 2565 C -> A Yes 3398 G -> A Yes

As noted above, cluster R35137 features 20 segment(s), which were listedin Table 970 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_node_(—)2 (SEQ IDNO:856) according to the present invention is supported by 19 libraries.The number of libraries was determined as previously described. Thissegment can be found in the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T3 (SEQ ID NO:114),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T5 (SEQ ID NO:115),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T10 (SEQ ID NO:116),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T11 (SEQ ID NO:117),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T12 (SEQ ID NO:118) andR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:119). Table 989 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 989 Segment location on transcripts Segment Segment endingTranscript name starting position position R35137_PEA_1_PEA_1_PEA_1_T3 1266 (SEQ ID NO: 114) R35137_PEA_1_PEA_1_PEA_1_T5 1 266 (SEQ ID NO: 115)R35137_PEA_1_PEA_1_PEA_1_T10 1 266 (SEQ ID NO: 116)R35137_PEA_1_PEA_1_PEA_1_T11 1 266 (SEQ ID NO: 117)R35137_PEA_1_PEA_1_PEA_1_T12 1 266 (SEQ ID NO: 118)R35137_PEA_1_PEA_1_PEA_1_T14 1 266 (SEQ ID NO: 119)

Segment cluster R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_node_(—)3 (SEQ IDNO:857) according to the present invention is supported by 24 libraries.The number of libraries was determined as previously described. Thissegment can be found in the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T3 (SEQ ID NO:114),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T5 (SEQ ID NO:115),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T10 (SEQ ID NO:116),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T11 (SEQ ID NO:117),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T12 (SEQ ID NO:118) andR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:119). Table 990 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 990 Segment location on transcripts Segment starting Segmentending Transcript name position position R35137_PEA_1_PEA_1_PEA_1_T3 267432 (SEQ ID NO: 114) R35137_PEA_1_PEA_1_PEA_1_T5 267 432 (SEQ ID NO:115) R35137_PEA_1_PEA_1_PEA_1_T10 267 432 (SEQ ID NO: 116)R35137_PEA_1_PEA_1_PEA_1_T11 267 432 (SEQ ID NO: 117)R35137_PEA_1_PEA_1_PEA_1_T12 267 432 (SEQ ID NO: 118)R35137_PEA_1_PEA_1_PEA_1_T14 267 432 (SEQ ID NO: 119)

Segment cluster R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_node_(—)9 (SEQ IDNO:858) according to the present invention is supported by 25 libraries.The number of libraries was determined as previously described. Thissegment can be found in the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T3 (SEQ ID NO:114),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T5 (SEQ ID NO:115),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T10 (SEQ ID NO:116),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T11 (SEQ ID NO:117),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T12 (SEQ ID NO:118) andR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:119). Table 991 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 991 Segment location on transcripts Segment starting Segmentending Transcript name position position R35137_PEA_1_PEA_1_PEA_1_T3 632765 (SEQ ID NO: 114) R35137_PEA_1_PEA_1_PEA_1_T5 632 765 (SEQ ID NO:115) R35137_PEA_1_PEA_1_PEA_1_T10 632 765 (SEQ ID NO: 116)R35137_PEA_1_PEA_1_PEA_1_T11 632 765 (SEQ ID NO: 117)R35137_PEA_1_PEA_1_PEA_1_T12 632 765 (SEQ ID NO: 118)R35137_PEA_1_PEA_1_PEA_1_T14 632 765 (SEQ ID NO: 119)

Segment cluster R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_node_(—)11 (SEQ IDNO:859) according to the present invention is supported by 30 libraries.The number of libraries was determined as previously described. Thissegment can be found in the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T3 (SEQ ID NO:114),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T5 (SEQ ID NO:115),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T10 (SEQ ID NO:116),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T11 (SEQ ID NO:117),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T12 (SEQ ID NO:118) andR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:119). Table 992 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 992 Segment location on transcripts Segment starting Segmentending Transcript name position position R35137_PEA_1_PEA_1_PEA_1_T3 766955 (SEQ ID NO: 114) R35137_PEA_1_PEA_1_PEA_1_T5 766 955 (SEQ ID NO:115) R35137_PEA_1_PEA_1_PEA_1_T10 766 955 (SEQ ID NO: 116)R35137_PEA_1_PEA_1_PEA_1_T11 766 955 (SEQ ID NO: 117)R35137_PEA_1_PEA_1_PEA_1_T12 766 955 (SEQ ID NO: 118)R35137_PEA_1_PEA_1_PEA_1_T14 766 955 (SEQ ID NO: 119)

Segment cluster R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_node_(—)16 (SEQ IDNO:860) according to the present invention is supported by 23 libraries.The number of libraries was determined as previously described. Thissegment can be found in the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T3 (SEQ ID NO:114),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T5 (SEQ ID NO:115),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T10 (SEQ ID NO:116),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T11 (SEQ ID NO:117) andR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T12 (SEQ ID NO:118). Table 993 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 993 Segment location on transcripts Segment starting Segmentending Transcript name position position R35137_PEA_1_PEA_1_PEA_1_T31157 1293 (SEQ ID NO: 114) R35137_PEA_1_PEA_1_PEA_1_T5 1090 1226 (SEQ IDNO: 115) R35137_PEA_1_PEA_1_PEA_1_T10 1157 1293 (SEQ ID NO: 116)R35137_PEA_1_PEA_1_PEA_1_T11 1090 1226 (SEQ ID NO: 117)R35137_PEA_1_PEA_1_PEA_1_T12 1157 1293 (SEQ ID NO: 118)

Segment cluster R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_node_(—)18 (SEQ IDNO:861) according to the present invention is supported by 24 libraries.The number of libraries was determined as previously described. Thissegment can be found in the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T3 (SEQ ID NO:114),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T5 (SEQ ID NO:115),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T10 (SEQ ID NO:116),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T11 (SEQ ID NO:117) andR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T12 (SEQ ID NO:118). Table 994 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 994 Segment location on transcripts Segment starting Segmentending Transcript name position position R35137_PEA_1_PEA_1_PEA_1_T31294 1468 (SEQ ID NO: 114) R35137_PEA_1_PEA_1_PEA_1_T5 1227 1401 (SEQ IDNO: 115) R35137_PEA_1_PEA_1_PEA_1_T10 1394 1568 (SEQ ID NO: 116)R35137_PEA_1_PEA_1_PEA_1_T11 1327 1501 (SEQ ID NO: 117)R35137_PEA_1_PEA_1_PEA_1_T12 1394 1568 (SEQ ID NO: 118)

Microarray (chip) data is also available for this segment as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotides were found to hit this segment (in relation to lungcancer), shown in Table 995.

TABLE 995 Oligonucleotides related to this segment Oligonucleotide nameOverexpressed in cancers Chip reference R35137_0_5_0 lung malignanttumors LUN (SEQ ID NO: 245)

Segment cluster R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_node_(—)20 (SEQ IDNO:862) according to the present invention is supported by 29 libraries.The number of libraries was determined as previously described. Thissegment can be found in the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T3 (SEQ ID NO:114),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T5 (SEQ ID NO:115),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T10 (SEQ ID NO:116),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T11 (SEQ ID NO:117) andR35137PEA_(—)1_PEA_(—)1_PEA_(—)1_T12 (SEQ ID NO:118). Table 996 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 996 Segment location on transcripts Segment starting Segmentending Transcript name position position R35137_PEA_1_PEA_1_PEA_1_T31469 1624 (SEQ ID NO: 114) R35137_PEA_1_PEA_1_PEA_1_T5 1402 1557 (SEQ IDNO: 115) R35137_PEA_1_PEA_1_PEA_1_T10 1569 1724 (SEQ ID NO: 116)R35137_PEA_1_PEA_1_PEA_1_T11 1502 1657 (SEQ ID NO: 117)R35137_PEA_1_PEA_1_PEA_1_T12 1569 1724 (SEQ ID NO: 118)

Segment cluster R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_node_(—)27 (SEQ IDNO:863) according to the present invention is supported by 39 libraries.The number of libraries was determined as previously described. Thissegment can be found in the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T3 (SEQ ID NO:114),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T5 (SEQ ID NO:115),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T10 (SEQ ID NO:116),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T11 (SEQ ID NO:117),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T12 (SEQ ID NO:118) andR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:119). Table 997 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 997 Segment location on transcripts Segment starting Segmentending Transcript name position position R35137_PEA_1_PEA_1_PEA_1_T31876 3898 (SEQ ID NO: 114) R35137_PEA_1_PEA_1_PEA_1_T5 1752 3774 (SEQ IDNO: 115) R35137_PEA_1_PEA_1_PEA_1_T10 1976 3998 (SEQ ID NO: 116)R35137_PEA_1_PEA_1_PEA_1_T11 2049 4071 (SEQ ID NO: 117)R35137_PEA_1_PEA_1_PEA_1_T12 2116 4138 (SEQ ID NO: 118)R35137_PEA_1_PEA_1_PEA_1_T14 1134 1250 (SEQ ID NO: 119)

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_node_(—)5 (SEQ IDNO:864) according to the present invention is supported by 20 libraries.The number of libraries was determined as previously described. Thissegment can be found in the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T3 (SEQ ID NO:114),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T5 (SEQ ID NO:115),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T10 (SEQ ID NO:116),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T11 (SEQ ID NO:117),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T12 (SEQ ID NO:118) andR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:119). Table 998 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 998 Segment location on transcripts Segment Segment startingending Transcript name position position R35137_PEA_1_PEA_1_PEA_1_T3 433522 (SEQ ID NO: 114) R35137_PEA_1_PEA_1_PEA_1_T5 433 522 (SEQ ID NO:115) R35137_PEA_1_PEA_1_PEA_1_T10 433 522 (SEQ ID NO: 116)R35137_PEA_1_PEA_1_PEA_1_T11 433 522 (SEQ ID NO: 117)R35137_PEA_1_PEA_1_PEA_1_T12 433 522 (SEQ ID NO: 118)R35137_PEA_1_PEA_1_PEA_1_T14 433 522 (SEQ ID NO: 119)

Segment cluster R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_node_(—)7 (SEQ IDNO:865) according to the present invention is supported by 23 libraries.The number of libraries was determined as previously described. Thissegment can be found in the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T3 (SEQ ID NO:114),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T5 (SEQ ID NO:115),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T10 (SEQ ID NO:116),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T11 (SEQ ID NO:117),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T12 (SEQ ID NO:118) andR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:119). Table 999 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 999 Segment location on transcripts Segment starting Segmentending Transcript name position position R35137_PEA_1_PEA_1_PEA_1_T3 523631 (SEQ ID NO: 114) R35137_PEA_1_PEA_1_PEA_1_T5 523 631 (SEQ ID NO:115) R35137_PEA_1_PEA_1_PEA_1_T10 523 631 (SEQ ID NO: 116)R35137_PEA_1_PEA_1_PEA_1_T11 523 631 (SEQ ID NO: 117)R35137_PEA_1_PEA_1_PEA_1_T12 523 631 (SEQ ID NO: 118)R35137_PEA_1_PEA_1_PEA_1_T14 523 631 (SEQ ID NO: 119)

Segment cluster R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_node_(—)12 (SEQ IDNO:866) according to the present invention is supported by 22 libraries.The number of libraries was determined as previously described. Thissegment can be found in the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T3 (SEQ ID NO:114),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T5 (SEQ ID NO:115),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T10 (SEQ ID NO:116),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T11 (SEQ ID NO:117) andR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T12 (SEQ ID NO:118). Table 1000 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1000 Segment location on transcripts Segment Segment endingTranscript name starting position position R35137_PEA_1_PEA_1_PEA_1_T3956 1009 (SEQ ID NO: 114) R35137_PEA_1_PEA_1_PEA_1_T5 956 1009 (SEQ IDNO: 115) R35137_PEA_1_PEA_1_PEA_1_T10 956 1009 (SEQ ID NO: 116)R35137_PEA_1_PEA_1_PEA_1_T11 956 1009 (SEQ ID NO: 117)R35137_PEA_1_PEA_1_PEA_1_T12 956 1009 (SEQ ID NO: 118)

Segment cluster R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_node_(—)14 (SEQ IDNO:867) according to the present invention is supported by 23 libraries.The number of libraries was determined as previously described. Thissegment can be found in the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T3 (SEQ ID NO:114),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T5 (SEQ ID NO:115),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T10 (SEQ ID NO:116),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T11 (SEQ ID NO:117) andR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T12 (SEQ ID NO:118). Table 1001 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1001 Segment location on transcripts Segment Segment endingTranscript name starting position position R35137_PEA_1_PEA_1_PEA_1_T31010 1089 (SEQ ID NO: 114) R35137_PEA_1_PEA_1_PEA_1_T5 1010 1089 (SEQ IDNO: 115) R35137_PEA_1_PEA_1_PEA_1_T10 1010 1089 (SEQ ID NO: 116)R35137_PEA_1_PEA_1_PEA_1_T11 1010 1089 (SEQ ID NO: 117)R35137_PEA_1_PEA_1_PEA_1_T12 1010 1089 (SEQ ID NO: 118)

Segment cluster R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_node_(—)15 (SEQ IDNO:868) according to the present invention is supported by 6 libraries.The number of libraries was determined as previously described. Thissegment can be found in the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T3 (SEQ ID NO:114),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T10 (SEQ ID NO:116) andR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T12 (SEQ ID NO:118). Table 1002 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1002 Segment location on transcripts Segment Segment startingending Transcript name position position R35137_PEA_1_PEA_1_PEA_1_T31090 1156 (SEQ ID NO: 114) R35137_PEA_1_PEA_1_PEA_1_T10 1090 1156 (SEQID NO: 116) R35137_PEA_1_PEA_1_PEA_1_T12 1090 1156 (SEQ ID NO: 118)

Segment cluster R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_node_(—)17 (SEQ IDNO:869) according to the present invention is supported by 5 libraries.The number of libraries was determined as previously described. Thissegment can be found in the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T10 (SEQ ID NO:116),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T11 (SEQ ID NO:117) andR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T12 (SEQ ID NO:118). Table 1003 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1003 Segment location on transcripts Segment starting SegmentTranscript name position ending position R35137_PEA_1_PEA_1_PEA_1_T101294 1393 (SEQ ID NO: 116) R35137_PEA_1_PEA_1_PEA_1_T11 1227 1326 (SEQID NO: 117) R35137_PEA_1_PEA_1_PEA_1_T12 1294 1393 (SEQ ID NO: 118)

Segment cluster R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_node_(—)21 (SEQ IDNO:870) according to the present invention is supported by 6 libraries.The number of libraries was determined as previously described. Thissegment can be found in the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T11 (SEQ ID NO:117) andR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T12 (SEQ ID NO:118). Table 1004 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1004 Segment location on transcripts Segment starting Segmentending Transcript name position position R35137_PEA_1_PEA_1_PEA_1_T111658 1731 (SEQ ID NO: 117) R35137_PEA_1_PEA_1_PEA_1_T12 1725 1798 (SEQID NO: 118)

Segment cluster R35137_PEA_(—)1_PEA_(—)1_node_(—)22 (SEQ ID NO:871)according to the present invention is supported by 31 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T3 (SEQ ID NO:114),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T5 (SEQ ID NO:115),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T10 (SEQ ID NO:116),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T11 (SEQ ID NO:117) andR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T12 (SEQ ID NO:118). Table 1005 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1005 Segment location on transcripts Segment starting Segmentending Transcript name position position R35137_PEA_1_PEA_1_PEA_1_T31625 1697 (SEQ ID NO: 114) R35137_PEA_1_PEA_1_PEA_1_T5 1558 1630 (SEQ IDNO: 115) R35137_PEA_1_PEA_1_PEA_1_T10 1725 1797 (SEQ ID NO: 116)R35137_PEA_1_PEA_1_PEA_1_T11 1732 1804 (SEQ ID NO: 117)R35137_PEA_1_PEA_1_PEA_1_T12 1799 1871 (SEQ ID NO: 118)

Segment cluster R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_node_(—)23 (SEQ IDNO:872) according to the present invention is supported by 29 libraries.The number of libraries was determined as previously described. Thissegment can be found in the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T3 (SEQ ID NO:114),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T5 (SEQ ID NO:115),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T10 (SEQ ID NO:116),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T11 (SEQ ID NO:117),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T12 (SEQ ID NO:118) andR35137_PEA_(—)1_PEA_(—)1_PEA _(—)1_T14 (SEQ ID NO:119). Table 1006 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1006 Segment location on transcripts Segment starting Segmentending Transcript name position position R35137_PEA_1_PEA_1_PEA_1_T31698 1737 (SEQ ID NO: 114) R35137_PEA_1_PEA_1_PEA_1_T5 1631 1670 (SEQ IDNO: 115) R35137_PEA_1_PEA_1_PEA_1_T10 1798 1837 (SEQ ID NO: 116)R35137_PEA_1_PEA_1_PEA_1_T11 1805 1844 (SEQ ID NO: 117)R35137_PEA_1_PEA_1_PEA_1_T12 1872 1911 (SEQ ID NO: 118)R35137_PEA_1_PEA_1_PEA_1_T14 956 995 (SEQ ID NO: 119)

Segment cluster R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_node_(—)24 (SEQ IDNO:873) according to the present invention is supported by 5 libraries.The number of libraries was determined as previously described. Thissegment can be found in the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T11 (SEQ ID NO:117) andR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T12 (SEQ ID NO:118). Table 1007 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1007 Segment location on transcripts Segment starting Segmentending Transcript name position position R35137_PEA_1_PEA_1_PEA_1_T111845 1910 (SEQ ID NO: 117) R35137_PEA_1_PEA_1_PEA_1_T12 1912 1977 (SEQID NO: 118)

Segment cluster R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_node_(—)25 (SEQ IDNO:874) according to the present invention is supported by 30 libraries.The number of libraries was determined as previously described. Thissegment can be found in the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T3 (SEQ ID NO:114),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T5 (SEQ ID NO:115),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T10 (SEQ ID NO:116),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T11 (SEQ ID NO:117),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T12 (SEQ ID NO:118) andR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:119). Table 1008 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1008 Segment location on transcripts Segment Segment startingending Transcript name position position R35137_PEA_1_PEA_1_PEA_1_T31738 1818 (SEQ ID NO: 114) R35137_PEA_1_PEA_1_PEA_1_T5 1671 1751 (SEQ IDNO: 115) R35137_PEA_1_PEA_1_PEA_1_T10 1838 1918 (SEQ ID NO: 116)R35137_PEA_1_PEA_1_PEA_1_T11 1911 1991 (SEQ ID NO: 117)R35137_PEA_1_PEA_1_PEA_1_T12 1978 2058 (SEQ ID NO: 118)R35137_PEA_1_PEA_1_PEA_1_T14 996 1076 (SEQ ID NO: 119)

Segment cluster R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_node_(—)26 (SEQ IDNO:875) according to the present invention is supported by 29 libraries.The number of libraries was determined as previously described. Thissegment can be found in the following transcript(s):R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T3 (SEQ ID NO:114),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T10 (SEQ ID NO:116),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T11 (SEQ ID NO:117),R35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T12 (SEQ ID NO:118) andR35137_PEA_(—)1_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:119). Table 1009 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1009 Segment location on transcripts Segment Segment startingending Transcript name position position R35137_PEA_1_PEA_1_PEA_1_T31819 1875 (SEQ ID NO: 114) R35137_PEA_1_PEA_1_PEA_1_T10 1919 1975 (SEQID NO: 116) R35137_PEA_1_PEA_1_PEA_1_T11 1992 2048 (SEQ ID NO: 117)R35137_PEA_1_PEA_1_PEA_1_T12 2059 2115 (SEQ ID NO: 118)R35137_PEA_1_PEA_1_PEA_1_T14 1077 1133 (SEQ ID NO: 119)

Variant Protein Alignment to the Previously Known Protein:

Description for Cluster Z25299

Cluster Z25299 features 5 transcript(s) and 11 segment(s) of interest,the names for which are given in Tables 1010 and 1011, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 1012.

TABLE 1010 Transcripts of interest Transcript Name Sequence ID No.Z25299_PEA_2_T1 120 Z25299_PEA_2_T2 121 Z25299_PEA_2_T3 122Z25299_PEA_2_T6 123 Z25299_PEA_2_T9 124

TABLE 1011 Segments of interest Segment Name Sequence ID No.Z25299_PEA_2_node_20 876 Z25299_PEA_2_node_21 877 Z25299_PEA_2_node_23878 Z25299_PEA_2_node_24 879 Z25299_PEA_2_node_8 880Z25299_PEA_2_node_12 881 Z25299_PEA_2_node_13 882 Z25299_PEA_2_node_14883 Z25299_PEA_2_node_17 884 Z25299_PEA_2_node_18 885Z25299_PEA_2_node_19 886

TABLE 1012 Proteins of interest Protein Name Sequence ID No.Z25299_PEA_2_P2 1390 Z25299_PEA_2_P3 1391 Z25299_PEA_2_P7 1392Z25299_PEA_2_P10 1393

These sequences are variants of the known protein Antileukoproteinase 1precursor (SwissProt accession identifier ALK1_HUMAN; known alsoaccording to the synonyms ALP; HUSI-1; Seminal proteinase inhibitor;Secretory leukocyte protease inhibitor; BLPI; Mucus proteinaseinhibitor; MPI; WAP four-disulfide core domain protein 4; Proteaseinhibitor WAP4), SEQ ID NO: 1454, referred to herein as the previouslyknown protein.

Protein Antileukoproteinase 1 precursor (SEQ ID NO:1454) is known orbelieved to have the following function(s): Acid-stable proteinaseinhibitor with strong affinities for trypsin, chymotrypsin, elastase,and cathepsin G. May prevent elastase-mediated damage to oral andpossibly other mucosal tissues. The sequence for proteinAntileukoproteinase 1 precursor is given at the end of the application,as “Antileukoproteinase 1 precursor amino acid sequence”. ProteinAntileukoproteinase 1 precursor localization is believed to be Secreted.

It has been investigated for clinical/therapeutic use in humans, forexample as a target for an antibody or small molecule, and/or as adirect therapeutic; available information related to theseinvestigations is as follows. Potential pharmaceutically related ortherapeutically related activity or activities of the previously knownprotein are as follows: Elastase inhibitor; Tryptase inhibitor. Atherapeutic role for a protein represented by the cluster has beenpredicted. The cluster was assigned this field because there wasinformation in the drug database or the public databases (e.g.,described herein above) that this protein, or part thereof, is used orcan be used for a potential therapeutic indication: Anti-inflammatory;Antiasthma.

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: proteinase inhibitor; serineprotease inhibitor, which are annotation(s) related to MolecularFunction.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

Cluster Z25299 can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 35 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 35 and Table 1013. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions:brain malignant tumors, a mixture of malignant tumors from differenttissues and ovarian carcinoma.

TABLE 1013 Normal tissue distribution Name of Tissue Number bladder 82bone 6 brain 0 colon 37 epithelial 145 general 73 head and neck 638kidney 26 liver 68 lung 465 breast 52 ovary 0 pancreas 20 prostate 36skin 215 stomach 219 uterus 113

TABLE 1014 P values and ratios for expression in cancerous tissue Nameof Tissue P1 P2 SP1 R3 SP2 R4 bladder 8.2e−01 8.5e−01 9.2e−01 0.69.7e−01 0.5 bone 5.5e−01 7.3e−01 4.0e−01 2.1 4.9e−01 1.5 brain 8.8e−021.5e−01 2.3e−03 7.7 1.2e−02 4.8 colon 3.3e−01 2.8e−01 4.2e−01 1.64.2e−01 1.5 epithelial 2.5e−01 7.6e−01 3.8e−01 1.0 1 0.6 general 6.4e−032.5e−01 1.7e−06 1.6 5.2e−01 0.9 head and neck 3.6e−01 5.9e−01 7.6e−010.6 1 0.3 kidney 7.4e−01 8.4e−01 2.1e−01 2.1 4.2e−01 1.4 liver 4.1e−019.1e−01 4.2e−02 3.2 6.4e−01 0.8 lung 7.6e−01 8.3e−01 9.8e−01 0.5 1 0.3breast 5.0e−01 5.5e−01 9.8e−02 1.6 3.4e−01 1.1 ovary 3.7e−02 3.0e−026.9e−03 6.1 4.9e−03 5.6 pancreas 3.8e−01 3.6e−01 3.6e−01 1.7 3.9e−01 1.5prostate 9.1e−01 9.2e−01 8.9e−01 0.5 9.4e−01 0.5 skin 6.0e−01 8.1e−019.3e−01 0.4 1 0.1 stomach 3.0e−01 8.1e−01 9.1e−01 0.6 1 0.3 uterus1.6e−01 1.3e−01 3.2e−02 1.6 3.0e−01 1.1

As noted above, cluster Z25299 features 5 transcript(s), which werelisted in Table 1010 above. These transcript(s) encode for protein(s)which are variant(s) of protein Antileukoproteinase 1 precursor (SEQ IDNO:1454). A description of each variant protein according to the presentinvention is now provided.

Variant protein Z25299_PEA_(—)2_P2 (SEQ ID NO:1390) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) Z25299_PEA_(—)2_T1 (SEQ IDNO:120). An alignment is given to the known protein (Antileukoproteinase1 precursor (SEQ ID NO:1454)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between Z25299_PEA_(—)2_P2 (SEQ ID NO:1390) andALK1_HUMAN (SEQ ID NO:1454):

1. An isolated chimeric polypeptide encoding for Z25299_PEA_(—)2_P2 (SEQID NO:1390), comprising a first amino acid sequence being at least 90%homologous toMKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCPGKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLNPPNFCEMDGQCKRDLKCCMGMCGKSCVSPVKcorresponding to amino acids 1-131 of ALK1_HUMAN (SEQ ID NO:1454), whichalso corresponds to amino acids 1-131 of Z25299_PEA_(—)2_P2 (SEQ IDNO:1390), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence GKQGMRAH (SEQ ID NO: 279) corresponding to aminoacids 132-139 of Z25299_PEA_(—)2_P2 (SEQ ID NO:1390), wherein said firstand second amino acid sequences are contiguous and in a sequentialorder.

2. An isolated polypeptide encoding for a tail of Z25299_PEA_(—)2_P2(SEQ ID NO:1390), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence GKQGMRAH (SEQ ID NO: 279) inZ25299_PEA_(—)2_P2 (SEQ ID NO:1390).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein Z25299_PEA_(—)2_P2 (SEQ ID NO:1390) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 1015, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein Z25299_PEA_(—)2_P2 (SEQ ID NO:1390) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 1015 Amino acid mutations SNP position(s) on amino acidAlternative sequence amino acid(s) Previously known SNP? 136 M -> T Yes20 P -> No 43 C -> R No 48 K -> N No 83 R -> K No 84 R -> W No

Variant protein Z25299_PEA_(—)2_P2 (SEQ ID NO:1390) is encoded by thefollowing transcript(s): Z25299_PEA_(—)2_T1 (SEQ ID NO:120), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript Z25299_PEA_(—)2_T1 (SEQ ID NO:120) is shown inbold; this coding portion starts at position 124 and ends at position540. The transcript also has the following SNPs as listed in Table 1016(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinZ25299_PEA_(—)2_P2 (SEQ ID NO:1390) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 1016 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 122 C -> T No 123 C -> T No530 T -> C Yes 989 C -> T Yes 1127 C -> T Yes 1162 A -> C Yes 1180 A ->C Yes 1183 A -> C Yes 1216 A -> C Yes 1262 G -> A Yes 183 T -> No 250 T-> C No 267 A -> C No 267 A -> G No 339 C -> T Yes 371 G -> A No 373 A-> T No 435 C -> T No

Variant protein Z25299_PEA_(—)2_P3 (SEQ ID NO:1391) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) Z25299_PEA_(—)2_T2 (SEQ IDNO:121). An alignment is given to the known protein (Antileukoproteinase1 precursor (SEQ ID NO:1454)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between Z25299_PEA_(—)2_P3 (SEQ ID NO:1391) andALK1_HUMAN (SEQ ID NO:1454):

1. An isolated chimeric polypeptide encoding for Z25299_PEA_(—)2_P3 (SEQID NO:1391), comprising a first amino acid sequence being at least 90%homologous toMKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCPGKKRCCPDTCGIKCLDPVDTPNPTRRKPGKCPVTYGQCLMLNPPNFCEMDGQCKRDLKCCMGMCGKSCVSPVKcorresponding to amino acids 1-131 of ALK1_HUMAN (SEQ ID NO:1454), whichalso corresponds to amino acids 1-131 of Z25299_PEA_(—)2_P3 (SEQ IDNO:1391), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence GEKRHHKQLRDQEVDPLEMRRHSAG (SEQ ID NO: 269)corresponding to amino acids 132-156 of Z25299_PEA_(—)2_P3 (SEQ IDNO:1391), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of Z25299_PEA_(—)2_P3(SEQ ID NO:1391), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence GEKRHHKQLRDQEVDPLEMRRHSAG (SEQ ID NO: 269) inZ25299_PEA_(—)2_P3 (SEQ ID NO:1391).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein Z25299_PEA_(—)2_P3 (SEQ ID NO:1391) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 1017, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein Z25299_PEA_(—)2_P3 (SEQ ID NO:1391) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 1017 Amino acid mutations SNP position(s) on AlternativePreviously amino acid sequence amino acid(s) known SNP? 20 P −> No 43 C−> R No 48 K −> N No 83 R −> K No 84 R −> W No

Variant protein Z25299_PEA_(—)2_P3 (SEQ ID NO:1391) is encoded by thefollowing transcript(s): Z25299_PEA_(—)2_T2 (SEQ ID NO:121), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript Z25299_PEA_(—)2_T2 (SEQ ID NO:121) is shown inbold; this coding portion starts at position 124 and ends at position591. The transcript also has the following SNPs as listed in Table 1018(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinZ25299_PEA_(—)2_P3 (SEQ ID NO:1391) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 1018 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 122 C −> T No 123 C −> T No183 T −> No 250 T −> C No 267 A −> C No 267 A −> G No 339 C −> T Yes 371G −> A No 373 A −> T No 435 C −> T No

Variant protein Z25299_PEA_(—)2_P7 (SEQ ID NO:1392) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) Z25299_PEA_(—)2_T6 (SEQ IDNO:123). An alignment is given to the known protein (Antileukoproteinase1 precursor (SEQ ID NO:1454)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between Z25299_PEA_(—)2_P7 (SEQ ID NO:1392) andALK1_HUMAN (SEQ ID NO:1454):

1. An isolated chimeric polypeptide encoding for Z25299_PEA_(—)2_P7 (SEQID NO:1392), comprising a first amino acid sequence being at least 90%homologous toMKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCPGKKRCCPDTCGIKCLDPVDTPNP corresponding to amino acids 1-81 of ALK1_HUMAN (SEQ IDNO:1454), which also corresponds to amino acids 1-81 ofZ25299_PEA_(—)2_P7 (SEQ ID NO:1392), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequence RGSLGSAQ (SEQ ID NO: 622)corresponding to amino acids 82-89 of Z25299_PEA_(—)2_P7 (SEQ IDNO:1392), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of Z25299_PEA_(—)2_P7(SEQ ID NO:1392), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence RGSLGSAQ (SEQ ID NO: 622) inZ25299_PEA_(—)2_P7 (SEQ ID NO:1392).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein Z25299_PEA_(—)2_P7 (SEQ ID NO:1392) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 1019, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein Z25299_PEA_(—)2_P7 (SEQ ID NO:1392) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 1019 Amino acid mutations SNP position(s) on amino acidAlternative sequence amino acid(s) Previously known SNP? 20 P -> No 43 C-> R No 48 K -> N No 82 R -> S No

Variant protein Z25299_PEA_(—)2_P7 (SEQ ID NO:1392) is encoded by thefollowing transcript(s): Z25299_PEA_(—)2_T6 (SEQ ID NO:123), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript Z25299_PEA_(—)2_T6 (SEQ ID NO:123) is shown inbold; this coding portion starts at position 124 and ends at position390. The transcript also has the following SNPs as listed in Table 1020(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinZ25299_PEA_(—)2_P7 (SEQ ID NO:1392) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 1020 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 122 C -> T No 123 C -> T No576 A -> C Yes 594 A -> C Yes 597 A -> C Yes 630 A -> C Yes 676 G -> AYes 183 T -> No 250 T -> C No 267 A -> C No 267 A -> G No 339 C -> T Yes369 A -> T No 431 C -> T No 541 C -> T Yes

Variant protein Z25299_PEA_(—)2_P10 (SEQ ID NO:1393) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) Z25299_PEA_(—)2_T9 (SEQ IDNO:124). An alignment is given to the known protein (Antileukoproteinase1 precursor (SEQ ID NO:1454)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between Z25299_PEA_(—)2_P10 (SEQ ID NO:1393) andALK1_HUMAN (SEQ ID NO:1454):

1. An isolated chimeric polypeptide encoding for Z25299_PEA_(—)2_P10(SEQ ID NO:1393), comprising a first amino acid sequence being at least90% homologous toMKSSGLFPFLVLLALGTLAPWAVEGSGKSFKAGVCPPKKSAQCLRYKKPECQSDWQCPGKKRCCPDTCGIKCLDPVDTPNPT corresponding to amino acids 1-82 of ALK1_HUMAN (SEQ IDNO:1454), which also corresponds to amino acids 1-82 ofZ25299_PEA_(—)2_P10 (SEQ ID NO:1393).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein Z25299_PEA_(—)2_P10 (SEQ ID NO:1393) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 1021, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein Z25299_PEA_(—)2_P10 (SEQ ID NO:1393) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 1021 Amino acid mutations SNP position(s) on amino acidAlternative sequence amino acid(s) Previously known SNP? 20 P -> No 43 C-> R No 48 K -> N No

Variant protein Z25299_PEA_(—)2_P10 (SEQ ID NO:1393) is encoded by thefollowing transcript(s): Z25299_PEA_(—)2_T9 (SEQ ID NO:124), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript Z25299_PEA_(—)2_T9 (SEQ ID NO:124) is shown inbold; this coding portion starts at position 124 and ends at position369. The transcript also has the following SNPs as listed in Table 1022(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinZ25299_PEA_(—)2_P10 (SEQ ID NO:1393) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 1022 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 122 C -> T No 123 C -> T No451 A -> C Yes 484 A -> C Yes 530 G -> A Yes 183 T -> No 250 T -> C No267 A -> C No 267 A -> G No 339 C -> T Yes 395 C -> T Yes 430 A -> C Yes448 A -> C Yes

As noted above, cluster Z25299 features 11 segment(s), which were listedm Table 2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster Z25299_PEA_(—)2_node_(—)20 (SEQ ID NO:876) according tothe present invention is supported by 6 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z25299_PEA_(—)2_T1 (SEQ IDNO:120). Table 1023 below describes the starting and ending position ofthis segment on each transcript.

TABLE 1023 Segment location on transcripts Segment starting SegmentTranscript name position ending position Z25299_PEA_2_T1 518 1099 (SEQID NO: 120)

Segment cluster Z25299_PEA_(—)2_node_(—)21 (SEQ ID NO:877) according tothe present invention is supported by 162 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z25299_PEA_(—)2_T1 (SEQ IDNO:120), Z25299_PEA_(—)2_T6 (SEQ ID NO:123) and Z25299_PEA_(—)2_T9 (SEQID NO:124). Table 1024 below describes the starting and ending positionof this segment on each transcript.

TABLE 1024 Segment location on transcripts Segment Segment Transcriptname starting position ending position Z25299_PEA_2_T1 (SEQ ID 1100 1292NO: 120) Z25299_PEA_2_T6 (SEQ ID 514 706 NO: 123) Z25299_PEA_2_T9 (SEQID 368 560 NO: 124)

Segment cluster Z25299_PEA_(—)2_node_(—)23 (SEQ ID NO:878) according tothe present invention is supported by 2 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z25299_PEA_(—)2_T2 (SEQ IDNO:121). Table 1025 below describes the starting and ending position ofthis segment on each transcript.

TABLE 1025 Segment location on transcripts Segment Transcript namestarting position Segment ending position Z25299_PEA_2_T2 518 707 (SEQID NO: 121)

Segment cluster Z25299_PEA_(—)2_node_(—)24 (SEQ ID NO:879) according tothe present invention is supported by 2 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z25299_PEA_(—)2_T2 (SEQ ID NO:121)and Z25299_PEA_(—)2_T3 (SEQ ID NO:122). Table 1026 below describes thestarting and ending position of this segment on each transcript.

TABLE 1026 Segment location on transcripts Segment Segment startingending Transcript name position position Z25299_PEA_2_T2 (SEQ ID NO:121) 708 886 Z25299_PEA_2_T3 (SEQ ID NO: 122) 518 696

Segment cluster Z25299_PEA_(—)2_node_(—)8 (SEQ ID NO:880) according tothe present invention is supported by 218 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z25299_PEA_(—)2_T1 (SEQ IDNO:120), Z25299_PEA_(—)2_T2 (SEQ ID NO:121), Z25299_PEA_(—)2_T3 (SEQ IDNO:122), Z25299_PEA_(—)2_T6 (SEQ ID NO:123) and Z25299_PEA_(—)2_T9 (SEQID NO:124). Table 1027 below describes the starting and ending positionof this segment on each transcript.

TABLE 1027 Segment location on transcripts Segment Segment startingending Transcript name position position Z25299_PEA_2_T1 (SEQ ID NO:120) 1 208 Z25299_PEA_2_T2 (SEQ ID NO: 121) 1 208 Z25299_PEA_2_T3 (SEQID NO: 122) 1 208 Z25299_PEA_2_T6 (SEQ ID NO: 123) 1 208 Z25299_PEA_2_T9(SEQ ID NO: 124) 1 208

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster Z25299_PEA_(—)2_node_(—)12 (SEQ ID NO:881) according tothe present invention is supported by 228 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z25299_PEA_(—)2_T1 (SEQ IDNO:120), Z25299_PEA_(—)2_T2 (SEQ ID NO:121), Z25299_PEA_(—)2_T3 (SEQ IDNO:122), Z25299_PEA_(—)2_T6 (SEQ ID NO:123) and Z25299_PEA_(—)2_T9 (SEQID NO:124). Table 1028 below describes the starting and ending positionof this segment on each transcript.

TABLE 1028 Segment location on transcripts Segment Segment startingending Transcript name position position Z25299_PEA_2_T1 (SEQ ID NO:120) 209 245 Z25299_PEA_2_T2 (SEQ ID NO: 121) 209 245 Z25299_PEA_2_T3(SEQ ID NO: 122) 209 245 Z25299_PEA_2_T6 (SEQ ID NO: 123) 209 245Z25299_PEA_2_T9 (SEQ ID NO: 124) 209 245

Segment cluster Z25299_PEA_(—)2_node_(—)13 (SEQ ID NO:882) according tothe present invention is supported by 246 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z25299_PEA_(—)2_T1 (SEQ IDNO:120), Z25299_PEA_(—)2_T2 (SEQ ID NO:121), Z25299_PEA_(—)2_T3 (SEQ IDNO:122), Z25299_PEA_(—)2_T6 (SEQ ID NO:123) and Z25299_PEA_(—)2_T9 (SEQID NO:124). Table 1029 below describes the starting and ending positionof this segment on each transcript.

TABLE 1029 Segment location on transcripts Segment Segment startingending Transcript name position position Z25299_PEA_2_T1 (SEQ ID NO:120) 246 357 Z25299_PEA_2_T2 (SEQ ID NO: 121) 246 357 Z25299_PEA_2_T3(SEQ ID NO: 122) 246 357 Z25299_PEA_2_T6 (SEQ ID NO: 123) 246 357Z25299_PEA_2_T9 (SEQ ID NO: 124) 246 357

Segment cluster Z25299_PEA_(—)2_node_(—)14 (SEQ ID NO:883) according tothe present invention can be found in the following transcript(s):Z25299_PEA_(—)2_T1 (SEQ ID NO:120), Z25299_PEA_(—)2_T2 (SEQ ID NO:121),Z25299_PEA_(—)2_T3 (SEQ ID NO:122), Z25299_PEA_(—)2_T6 (SEQ ID NO:123)and Z25299_PEA_(—)2_T9 (SEQ ID NO:124). Table 1030 below describes thestarting and ending position of this segment on each transcript.

TABLE 1030 Segment location on transcripts Segment Segment endingTranscript name starting position position Z25299_PEA_2_T1 (SEQ ID NO:120) 358 367 Z25299_PEA_2_T2 (SEQ ID NO: 121) 358 367 Z25299_PEA_2_T3(SEQ ID NO: 122) 358 367 Z25299_PEA_2_T6 (SEQ ID NO: 123) 358 367Z25299_PEA_2_T9 (SEQ ID NO: 124) 358 367

Segment cluster Z25299_PEA2_node_(—)17 (SEQ ID NO:884) according to thepresent invention can be found in the following transcript(s):Z25299_PEA_(—)2_T1 (SEQ ID NO:120), Z25299_PEA_(—)2_T2 (SEQ ID NO:121)and Z25299_PEA_(—)2_T3 (SEQ ID NO:122). Table 1031 below describes thestarting and ending position of this segment on each transcript.

TABLE 1031 Segment location on transcripts Segment Segment endingTranscript name starting position position Z25299_PEA_2_T1 (SEQ ID NO:120) 368 371 Z25299_PEA_2_T2 (SEQ ID NO: 121) 368 371 Z25299_PEA_2_T3(SEQ ID NO: 122) 368 371

Segment cluster Z25299_PEA_(—)2_node_(—)18 (SEQ ID NO:885) according tothe present invention is supported by 221 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z25299_PEA_(—)2_T1 (SEQ IDNO:120), Z25299_PEA_(—)2_T2 (SEQ ID NO:121), Z25299_PEA_(—)2_T3 (SEQ IDNO:122) and Z25299_PEA_(—)2_T6 (SEQ ID NO:123). Table 1032 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1032 Segment location on transcripts Segment Segment endingTranscript name starting position position Z25299_PEA_2_T1 (SEQ ID NO:120) 372 427 Z25299_PEA_2_T2 (SEQ ID NO: 121) 372 427 Z25299_PEA_2_T3(SEQ ID NO: 122) 372 427 Z25299_PEA_2_T6 (SEQ ID NO: 123) 368 423

Segment cluster Z25299_PEA_(—)2_node_(—)19 (SEQ ID NO:886) according tothe present invention is supported by 197 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): Z25299_PEA_(—)2_T1 (SEQ IDNO:120), Z25299_PEA_(—)2_T2 (SEQ ID NO:121), Z25299_PEA_(—)2_T3 (SEQ IDNO:122) and Z25299_PEA_(—)2_T6 (SEQ ID NO:123). Table 1033 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1033 Segment location on transcripts Segment Segment endingTranscript name starting position position Z25299_PEA_2_T1 (SEQ ID NO:120) 428 517 Z25299_PEA_2_T2 (SEQ ID NO: 121) 428 517 Z25299_PEA_2_T3(SEQ ID NO: 122) 428 517 Z25299_PEA_2_T6 (SEQ ID NO: 123) 424 513

Variant Protein Alignment to the Previously Known Protein:

Expression of Secretory Leukocyte Protease Inhibitor Acid-StableProteinase Inhibitor Z25299 Transcripts, Which are Detectable byAmplicon as Depicted in Sequence Name Z25299 Junc13-14-21 (SEQ ID NO:1666) in Normal and Cancerous Lung Tissues

Expression of Secretory leukocyte protease inhibitor Acid-stableproteinase inhibitor transcripts detectable by or according tojunc13-14-21, Z25299 junc13-14-21 amplicon (SEQ ID NO: 1666) and Z25299junc13-14-21F (SEQ ID NO: 1664) and Z25299 junc13-14-21R (SEQ ID NO:1665) primers was measured by real time PCR. In parallel the expressionof four housekeeping genes—PBGD (GenBank Accession No. BC019323 (SEQ IDNO:1713); amplicon—PBGD-amplicon, SEQ ID NO:334), HPRT1 (GenBankAccession No. NM_(—)000194 (SEQ ID NO:1714); amplicon—HPRT1-amplicon,SEQ ID NO:1297), Ubiquitin (GenBank Accession No. BC000449 (SEQ IDNO:1711); amplicon—Ubiquitin-amplicon, SEQ ID NO:328) and SDHA (GenBankAccession No. NM_(—)004168 (SEQ ID NO:1712); amplicon—SDHA-amplicon, SEQID NO:331) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of thequantities of the housekeeping genes. The normalized quantity of each RTsample was then divided by the median of the quantities of the normalpost-mortem (PM) samples (Sample Nos. 47-50, 90-93, 96-99, Table 2“Tissue sample in testing panel”, above), to obtain a value of folddifferential expression for each sample relative to median of the normalPM samples.

FIG. 36 is a histogram showing down regulation of the above-indicatedSecretory leukocyte protease inhibitor Acid-stable proteinase inhibitortranscripts in cancerous lung samples relative to the normal samples.

As is evident from FIG. 36, the expression of Secretory leukocyteprotease inhibitor Acid-stable proteinase inhibitor transcriptsdetectable by the above amplicon(s) in cancer samples was significantlylower than in the non-cancerous samples (Sample Nos. 47-50, 90-93, 96-99Table 2, “Tissue sample in testing panel”).

Statistical analysis was applied to verify the significance of theseresults, as described below.

The P value for the difference in the expression levels of Secretoryleukocyte protease inhibitor Acid-stable proteinase inhibitortranscripts detectable by the above amplicon(s) in lung cancer samplesversus the normal tissue samples was determined by T test as 1.98E-04.This value demonstrates statistical significance of the results.

Primer pairs are also optionally and preferably encompassed within thepresent invention; for example, for the above experiment, the followingprimer pair was used as a non-limiting illustrative example only of asuitable primer pair: Z25299 junc13-14-21F forward primer (SEQ ID NO:1664); and Z25299 junc13-14-21R reverse primer (SEQ ID NO: 1665).

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: Z25299 junc13-14-21(SEQ ID NO: 1666).

Forward primer (SEQ ID NO: 1664): ACCCCAAACCCAACTTGATTC Reverse primer(SEQ ID NO: 1665): TCAGTGGTGGAGCCAAGTCTC Amplicon (SEQ ID NO: 1666):ACCCCAAACCCAACTTGATTCCTGCCATATGGAGGAGGCTCTGGAGTCCTGCTCTGTGTGGTCCAGGTCCTTTCCACCCTGAGACTTGGCTCCACCAC TGA

Z25299 Transcripts, Which are Detectable by Amplicon as Depicted inSequence Name Z25299 Seg20 (SEQ ID NO: 1669) in Normal and CancerousLung Tissues

Expression of Secretory leukocyte protease inhibitor Acid-stableproteinase inhibitor transcripts detectable by or according to seg20,Z25299 seg20 amplicon (SEQ ID NO: 1669) and Z25299 seg2OF (SEQ ID NO:1667) and Z25299 seg20R (SEQ ID NO: 1668) primers was measured by realtime PCR. In parallel the expression of four housekeeping genes—PBGD(GenBank Accession No. BC019323 (SEQ ID NO:1713);amplicon—PBGD-amplicon, SEQ ID NO:334), HPRT1 (GenBank Accession No.NM_(—)000194 (SEQ ID NO:1714); amplicon—HPRT1-amplicon, SEQ ID NO:1297),Ubiquitin (GenBank Accession No. BC000449 (SEQ ID NO:1711);amplicon—Ubiquitin-amplicon, SEQ ID NO:328) and SDHA (GenBank AccessionNo. NM_(—)004168 (SEQ ID NO:1712); amplicon—SDHA-amplicon, SEQ IDNO:331) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of thequantities of the housekeeping genes. The normalized quantity of each RTsample was then divided by the median of the quantities of the normalpost-mortem (PM) samples (Sample Nos. 47-50, 90-93, 96-99, Table 2,“Tissue samples in testing panel”, above). Then the reciprocal of thisratio was calculated, to obtain a value of fold down-regulation for eachsample relative to median of the normal PM samples.

FIG. 37 is a histogram showing down regulation of the above-indicatedSecretory leukocyte protease inhibitor Acid-stable proteinase inhibitortranscripts in cancerous lung samples relative to the normal samples.The number and percentage of samples that exhibit at least 5 fold downregulation, out of the total number of samples tested is indicated inthe bottom.

As is evident from FIG. 37, the expression of Secretory leukocyteprotease inhibitor Acid-stable proteinase inhibitor transcriptsdetectable by the above amplicon(s) in cancer samples was significantlylower than in the non-cancerous samples (Sample Nos. 47-50, 90-93, 96-99Table 2, “Tissue sample in testing panel”). Notably an down regulationof at least 5 fold was found in 6 out of 15 adenocarcinoma samples, 9out of 16 squamous cell carcinoma samples, 3 out of 4 large cellcarcinoma samples and in 8 out of 8 small cell carcinoma samples.

Statistical analysis was applied to verify the significance of theseresults, as described below.

The P value for the difference in the expression levels of Secretoryleukocyte protease inhibitor Acid-stable proteinase inhibitortranscripts detectable by the above amplicon(s) in lung cancer samplesversus the normal tissue samples was determined by T test as 9.43E-02 inadenocarcinoma, 5.62E-02 in squamous cell carcinoma, 3.38E-01 in largecell carcinoma and 3.78E-02 in small cell carcinoma.

Threshold of 5 fold down regulation was found to differentiate betweencancer and normal samples with P value of 3.73E-02 in adenocarcinoma,1.10E-02 in squamous cell carcinoma, 2.64E-02 in large cell carcinomaand 7.14E-05 in small cell carcinoma checked by exact fisher test. Theabove values demonstrate statistical significance of the results.

Primer pairs are also optionally and preferably encompassed within thepresent invention; for example, for the above experiment, the followingprimer pair was used as a non-limiting illustrative example only of asuitable primer pair: Z25299 seg2OF forward primer (SEQ ID NO: 1667);and Z25299 seg20R reverse primer (SEQ ID NO: 1668).

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: Z25299 seg20 (SEQ IDNO: 1669).

Forward primer (SEQ ID NO: 1667): CTCCTGAACCCTACTCCAAGCA Reverse primer(SEQ ID NO: 1668): CAGGCGATCCTATGGAAATCC Amplicon (SEQ ID NO: 1669):CTCCTGAACCCTACTCCAAGCACAGCCTCTGTCTGACTCCCTTGTCCTTCAAGAGAACTGTTCTCCAGGTCTCAGGGCCAGGATTTCCATAGGATCGCCT G

Expression of Homo sapiens Secretory Leukocyte Protease Inhibitor(Antileukoproteinase) (SLPI) Z25299 Transcripts Which are Detectable byAmplicon as Depicted in Sequence Name Z25299 seg23(SEQ ID NO: 1672) inNormal and Cancerous Lung Tissues

Expression of Homo sapiens secretory leukocyte protease inhibitor(antileukoproteinase) (SLPI) transcripts detectable by or according toseg23, Z25299 seg23 amplicon (SEQ ID NO: 1672) and primers Z25299 seg23F(SEQ ID NO: 1670) and Z25299 seg23R (SEQ ID NO: 1671) was measured byreal time PCR. In parallel the expression of four housekeepinggenes—PBGD (GenBank Accession No. BC019323 (SEQ ID NO:1713);amplicon—PBGD-amplicon, SEQ ID NO:334), HPRT1 (GenBank Accession No.NM_(—)000194 (SEQ ID NO:1714); amplicon—HPRT1-amplicon, SEQ ID NO:1297),Ubiquitin (GenBank Accession No. BC000449 (SEQ ID NO:1711);amplicon—Ubiquitin-amplicon, SEQ ID NO:328) and SDHA (GenBank AccessionNo. NM_(—)004168 (SEQ ID NO:1712); amplicon—SDHA-amplicon, SEQ IDNO:331), was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of thequantities of the housekeeping genes. The normalized quantity of each RTsample was then divided by the median of the quantities of the normalpost-mortem (PM) samples (Sample Nos. 47-50, 90-93, 96-99, Table 2,above). Then the reciprocal of this ratio was calculated, to obtain avalue of fold down-regulation for each sample relative to median of thenormal PM samples.

FIG. 68 is a histogram showing down regulation of the above-indicatedHomo sapiens secretory leukocyte protease inhibitor(antileukoproteinase) (SLPI) transcripts in cancerous lung samplesrelative to the normal samples.

As is evident from FIG. 68, the expression of Homo sapiens secretoryleukocyte protease inhibitor (antileukoproteinase) (SLPI) transcriptsdetectable by the above amplicon(s) in cancer samples was significantlylower than in the non-cancerous samples (Sample Nos. 46-50, 90-93, 96-99Table 2). Notably down regulation of at least 10 fold was found in 7 outof 15 adenocarcinoma samples, 9 out of 16 squamous cell carcinomasamples, 3 out of 4 large cell carcinoma samples and in 8 out of 8 smallcells carcinoma samples.

Primer pairs are also optionally and preferably encompassed within thepresent invention; for example, for the above experiment, the followingprimer pair was used as a non-limiting illustrative example only of asuitable primer pair: Z25299 seg23F forward primer (SEQ ID NO: 1670);and Z25299 seg23R reverse primer (SEQ ID NO: 1671).

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: Z25299 seg23 (SEQ IDNO: 1672).

Primers: Forward primer Z25299 seg23F (SEQ ID NO: 1670):CAAGCAATTGAGGGACCAGG Reverse primer Z25299 seg23R (SEQ ID NO: 1671):CAAAAAACATTGTTAATGAGAGAGATGAC Amplicon Z25299 seg23F (SEQ ID NO: 1672):CAAGCAATTGAGGGACCAGGAAGTGGATCCTCTAGAGATGAGGAGGCATTCTGCTGGATGACTTTTAAAAATGTTTTCTCCAGAGTCATCTCTCTCATTA ACAATGTTTTTTG

Expression of Secretory leukocyte protease inhibitor Acid-stableproteinase inhibitor Z25299 transcripts which are detectable by ampliconas depicted in sequence name Z25299seg20 (SEQ ID NO: 1669) in differentnormal tissues

Expression of Secretory leukocyte protease inhibitor transcriptsdetectable by or according to Z25299seg20 amplicon (SEQ ID NO: 1669) andprimers: Z25299seg23F (SEQ ID NO: 1667) Z25299seg20R (SEQ ID NO: 1668)was measured by real time PCR. In parallel the expression of fourhousekeeping genes—RPL19 (GenBank Accession No. NM_(—)000981 (SEQ IDNO:1715); RPL19 amplicon, SEQ ID NO:1630), TATA box (GenBank AccessionNo. NM_(—)003194 (SEQ ID NO:1716); TATA amplicon, SEQ ID NO:1633),Ubiquitin (GenBank Accession No. BC000449 (SEQ ID NO:1711);amplicon—Ubiquitin-amplicon, SEQ ID NO:328) and SDHA (GenBank AccessionNo. NM_(—)004168 (SEQ ID NO:1712); amplicon—SDHA-amplicon, SEQ IDNO:331) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of thequantities of the housekeeping genes. The normalized quantity of each RTsample was then divided by the median of the quantities of the ovarysamples (Sample Nos. 18-20, Table 3), to obtain a value of relativeexpression of each sample relative to median of the ovary samples.

Primers: Forward primer (SEQ ID NO: 1667): CTCCTGAACCCTACTCCAAGCAReverse primer (SEQ ID NO: 1668): CAGGCGATCCTATGGAAATCC Amplicon (SEQ IDNO: 1669): CTCCTGAACCCTACTCCAAGCACAGCCTCTGTCTGACTCCCTTGTCCTTCAAGAGAACTGTTCTCCAGGTCTCAGGGCCAGGATTTCCATAGGATCGCCT G

The results are demonstrated in FIG. 69, showing the expression ofSecretory leukocyte protease inhibitor Acid-stable proteinase inhibitorZ25299 transcripts which are detectable by amplicon as depicted insequence name Z25299seg20 (SEQ ID NO: 1669) in different normal tissues.

Expression of Secretory Leukocyte Protease Inhibitor Z25299 TranscriptsWhich are Detectable by Amplicon as Depicted in Sequence NameZ25299seg23 (SEQ ID NO: 1672) in Different Normal Tissues

Expression of Secretory leukocyte protease inhibitor transcriptsdetectable by or according to Z25299seg23 amplicon (SEQ ID NO: 1672) andprimers: Z25299seg23F (SEQ ID NO: 1670) Z25299seg23R (SEQ ID NO: 1671)was measured by real time PCR. In parallel the expression of fourhousekeeping genes—RPL19 (GenBank Accession No. NM_(—)000981 (SEQ IDNO:1715); RPL19 amplicon, SEQ ID NO:1630), TATA box (GenBank AccessionNo. NM_(—)003194 (SEQ ID NO:1716); TATA amplicon, SEQ ID NO:1633),Ubiquitin (GenBank Accession No. BC000449 (SEQ ID NO:1711);amplicon—Ubiquitin-amplicon, SEQ ID NO:328) and SDHA (GenBank AccessionNo. NM_(—)004168 (SEQ ID NO:1712); amplicon—SDHA-amplicon, SEQ IDNO:331) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of thequantities of the housekeeping genes. The normalized quantity of each RTsample was then divided by the median of the quantities of the ovarysamples (Sample Nos. 18-20, Table 3), to obtain a value of relativeexpression of each sample relative to median of the ovary samples.

Primers: Forward primer Z25299 seg23F (SEQ ID NO: 1670):CAAGCAATTGAGGGACCAGG Reverse primer Z25299 seg23R (SEQ ID NO: 1671):CAAAAAACATTGTTAATGAGAGAGATGAC Amplicon Z25299 seg23F (SEQ ID NO: 1672):CAAGCAATTGAGGGACCAGGAAGTGGATCCTCTAGAGATGAGGAGGCATTCTGCTGGATGACTTTTAAAAATGTTTTCTCCAGAGTCATCTCTCTCATTA ACAATGTTTTTG

The results are demonstrated in FIG. 70, showing the expression ofSecretory leukocyte protease inhibitor Acid-stable proteinase inhibitorZ25299 transcripts which are detectable by amplicon as depicted insequence name Z25299seg23 (SEQ ID NO: 1672) in different normal tissues.

Description for Cluster HSSTROL3

Cluster HSSTROL3 features 6 transcript(s) and 16 segment(s) of interest,the names for which are given in Tables 1034 and 1035, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 1036.

TABLE 1034 Transcripts of interest Transcript Name Sequence ID No.HSSTROL3_T5 125 HSSTROL3_T8 126 HSSTROL3_T9 127 HSSTROL3_T10 128HSSTROL3_T11 129 HSSTROL3_T12 130

TABLE 1035 Segments of interest Segment Name Sequence ID No.HSSTROL3_node_6 887 HSSTROL3_node_10 888 HSSTROL3_node_13 889HSSTROL3_node_15 890 HSSTROL3_node_19 891 HSSTROL3_node_21 892HSSTROL3_node_24 893 HSSTROL3_node_25 894 HSSTROL3_node_26 895HSSTROL3_node_28 896 HSSTROL3_node_29 897 HSSTROL3_node_11 898HSSTROL3_node_17 899 HSSTROL3_node_18 900 HSSTROL3_node_20 901HSSTROL3_node_27 902

TABLE 1036 Proteins of interest Sequence Protein Name ID No.Corresponding Transcript(s) HSSTROL3_P4 1394 HSSTROL3_T5 (SEQ ID NO:125) HSSTROL3_P5 1395 HSSTROL3_T8 (SEQ ID NO: 126); HSSTROL3_T9 (SEQ IDNO: 127) HSSTROL3_P7 1396 HSSTROL3_T10 (SEQ ID NO: 128) HSSTROL3_P8 1397HSSTROL3_T11 (SEQ ID NO: 129) HSSTROL3_P9 1398 HSSTROL3_T12 (SEQ ID NO:130)

These sequences are variants of the known protein Stromelysin-3precursor (SwissProt accession identifier MM11_HUMAN; known alsoaccording to the synonyms EC 3.4.24.-; Matrix metalloproteinase-11;MMP-11; ST3; SL-3), SEQ ID NO: 1455, referred to herein as thepreviously known protein.

Protein Stromelysin-3 precursor (SEQ ID NO:1455) is known or believed tohave the following function(s): May play an important role in theprogression of epithelial malignancies. The sequence for proteinStromelysin-3 precursor is given at the end of the application, as“Stromelysin-3 precursor amino acid sequence”.

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: proteolysis, and peptidolysis;developmental processes; morphogenesis, which are annotation(s) relatedto Biological Process; stromelysin 3; calcium binding; zinc binding;hydrolase, which are annotations) related to Molecular Function; andextracellular matrix, which are annotation(s) related to CellularComponent.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

Cluster HSSTROL3 can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the left hand columnof the table and the numbers on the y-axis of FIG. 38 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 38 and Table 1037. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions:transitional cell carcinoma, epithelial malignant tumors, a mixture ofmalignant tumors from different tissues and pancreas carcinoma.

TABLE 1037 Normal tissue distribution Name of Tissue Number adrenal 0bladder 0 brain 1 colon 63 epithelial 33 general 13 head and neck 101kidney 0 lung 11 breast 8 ovary 14 pancreas 0 prostate 2 skin 99 Thyroid0 uterus 181

TABLE 1038 P values and ratios for expression in cancerous tissue Nameof Tissue P1 P2 SP1 R3 SP2 R4 adrenal 1 4.6e−01 1 1.0 5.3e−01 1.9bladder 2.7e−01 3.4e−01 3.3e−03 4.9 2.1e−02 3.3 brain 3.5e−01 2.6e−01 11.7 3.3e−01 2.8 colon 7.7e−02 1.5e−01 3.1e−01 1.4 5.2e−01 1.0 epithelial1.2e−04 1.2e−02 1.3e−06 2.7 4.6e−02 1.4 general 5.4e−09 3.1e−05 1.8e−165.0 3.1e−07 2.6 head and neck 4.6e−01 4.3e−01 1 0.6 9.4e−01 0.7 kidney2.5e−01 3.5e−01 1.1e−01 4.0 2.4e−01 2.8 lung 1.8e−01 4.5e−01 1.9e−01 2.75.1e−01 1.4 breast 2.0e−01 3.4e−01 7.3e−02 3.3 2.5e−01 2.0 ovary 2.6e−013.2e−01 2.2e−02 2.0 7.0e−02 1.6 pancreas 9.5e−02 1.8e−01 1.8e−04 7.81.6e−03 5.5 prostate 8.2e−01 7.8e−01 4.5e−01 1.8 5.6e−01 1.5 skin5.2e−01 5.8e−01 7.1e−01 0.8 1 0.3 Thyroid 2.9e−01 2.9e−01 1 1.1 1 1.1uterus 4.2e−01 8.0e−01 7.5e−01 0.6 9.9e−01 0.4

As notes above, cluster HSSTROL3 features b transcript(s), which werelisted in Table 1034 above. These transcript(s) encode for protein(s)which are variant(s) of protein Stromelysin-3 precursor (SEQ IDNO:1455). A description of each variant protein according to the presentinvention is now provided.

Variant protein HSSTROL3_P4 (SEQ ID NO:1394) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) HSSTROL3_T5 (SEQ ID NO:125).An alignment is given to the known protein (Stromelysin-3 precursor (SEQID NO:1455)) at the end of the application. One or more alignments toone or more previously published protein sequences are given at the endof the application. A brief description of the relationship of thevariant protein according to the present invention to each such alignedprotein is as follows:

Comparison report between HSSTROL3_P4 (SEQ ID NO:1394) and MM11_HUMAN(SEQ ID NO:1455)

1. An isolated chimeric polypeptide encoding for HSSTROL3_P4 (SEQ IDNO:1394), comprising a first amino acid sequence being at least 90%homologous toMAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQPWHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVLSGGRWEKTDLTYRILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHEGRADIMIDFARYW corresponding to amino acids 1-163 ofMM11_HUMAN (SEQ ID NO:1455), which also corresponds to amino acids 1-163of HSSTROL3_P4 (SEQ ID NO:1394), a bridging amino acid H correspondingto amino acid 164 of HSSTROL3_P4 (SEQ ID NO:1394), a second amino acidsequence being at least 90% homologous toGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWTIGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDCRGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLEPDAPPDACEASFDAVSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGLPSPVDAAFEDAQGHIWFFQGAQYWVYDGEKPVLGPAPLTELGLVRFPVHAALVWGPEKNKIYFFRGRDYWRFHPSTRRVDSPVPRRATDWRGVPSEIDAAFQDADG corresponding to amino acids 165-445 of MM11_HUMAN (SEQ ID NO:1455),which also corresponds to amino acids 165-445 of HSSTROL3_P4 (SEQ IDNO:1394), and a third amino acid sequence being at least 70%, optionallyat least 80%, preferably at least 85%, more preferably at least 90% andmost preferably at least 95% homologous to a polypeptide having thesequence ALGVRQLVGGGHSSRFSHLVVAGLPHACHRKSGSSSQVLCPEPSALLSVAG (SEQ ID NO:251) corresponding to amino acids 446-496 of HSSTROL3_P4 (SEQ IDNO:1394), wherein said first amino acid sequence, bridging amino acid,second amino acid sequence and third amino acid sequence are contiguousand in a sequential order.

2. An isolated polypeptide encoding for a tail of HSSTROL3_P4 (SEQ IDNO:1394), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence ALGVRQLVGGGHSSRFSHLVVAGLPHACHRKSGSSSQVLCPEPSALLSVAG (SEQ ID NO:251) in HSSTROL3_P4 (SEQ ID NO:1394).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein HSSTROL3_P4 (SEQ ID NO:1394) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table1039, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein HSSTROL3_P4 (SEQ ID NO:1394) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 1039 Amino acid mutations SNP position(s) on amino acid sequenceAlternative amino acid(s) Previously known SNP? 38 V -> A Yes 104 R -> PYes 214 A -> No 323 Q -> H Yes

Variant protein HSSTROL3_P4 (SEQ ID NO:1394) is encoded by the followingtranscript(s): HSSTROL3_T5 (SEQ ID NO:125), for which the sequence(s)is/are given at the end of the application. The coding portion oftranscript HSSTROL3_T5 (SEQ ID NO:125) is shown in bold; this codingportion starts at position 24 and ends at position 1511. The transcriptalso has the following SNPs as listed in Table 1040 (given according totheir position on the nucleotide sequence, with the alternative nucleicacid listed; the last column indicates whether the SNP is known or not;the presence of known SNPs in variant protein HSSTROL3_P4 (SEQ IDNO:1394) sequence provides support for the deduced sequence of thisvariant protein according to the present invention).

TABLE 1040 Nucleic acid SNPs SNP position on nucleotide sequenceAlternative nucleic acid Previously known SNP? 136 T -> C Yes 334 G -> CYes 663 G -> No 699 -> T No 992 G -> C Yes 1528 A -> G Yes 1710 A -> GYes 2251 A -> G Yes 2392 C -> No 2444 C -> A Yes 2470 A -> T Yes 2687 ->G No 2696 -> G No 2710 C -> No 2729 -> A No 2755 T -> C No 2813 A -> No2813 A -> C No 2963 A -> No 2963 A -> C No 2993 T -> C Yes 3140 -> T No

Variant protein HSSTROL3_P5 (SEQ ID NO:1395) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) HSSTROL3_T8 (SEQ ID NO:126)and HSSTROL3_T9 (SEQ ID NO:127). An alignment is given to the knownprotein (Stromelysin-3 precursor (SEQ ID NO:1455)) at the end of theapplication. One or more alignments to one or more previously publishedprotein sequences are given at the end of the application. A briefdescription of the relationship of the variant protein according to thepresent invention to each such aligned protein is as follows:

Comparison report between HSSTROL3_P5 (SEQ ID NO:1395) and MM11_HUMAN(SEQ ID NO:1455)

1. An isolated chimeric polypeptide encoding for HSSTROL3_P5 (SEQ IDNO:1395), comprising a first amino acid sequence being at least 90%homologous toMAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQPWHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVLSGGRWEKTDLTYRILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHEGRADIMIDFARYW corresponding to amino acids 1-163 ofMM11_HUMAN (SEQ ID NO:1455), which also corresponds to amino acids 1-163of HSSTROL3_P5 (SEQ ID NO:1395), a bridging amino acid H correspondingto amino acid 164 of HSSTROL3_P5 (SEQ ID NO:1395), a second amino acidsequence being at least 90% homologous toGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWTIGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDCRGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLEPDAPPDACEASFDAVSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGLPSPVDAAFEDAQGHIWFFQ corresponding toamino acids 165-358 of MM11_HUMAN (SEQ ID NO:1455), which alsocorresponds to amino acids 165-358 of HSSTROL3_P5 (SEQ ID NO:1395), anda third amino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceELGFPSSTGRDESLEHCRCQGLHK (SEQ ID NO: 252) corresponding to amino acids359-382 of HSSTROL3_P5 (SEQ ID NO:1395), wherein said first amino acidsequence, bridging amino acid, second amino acid sequence and thirdamino acid sequence are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of HSSTROL3_P5 (SEQ IDNO:1395), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence ELGFPSSTGRDESLEHCRCQGLHK (SEQ ID NO: 252) in HSSTROL3_P5 (SEQID NO:1395).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein HSSTROL3_P5 (SEQ ID NO:1395) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table1041, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein HSSTROL3_P5 (SEQ ID NO:1395) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 1041 Amino acid mutations SNP position(s) on amino acid sequenceAlternative amino acid(s) Previously known SNP? 38 V -> A Yes 104 R -> PYes 214 A -> No 323 Q -> H Yes

Variant protein HSSTROL3_P5 (SEQ ID NO:1395) is encoded by the followingtranscript(s): HSSTROL3_T8 (SEQ ID NO:126) and HSSTROU_T9 (SEQ IDNO:127), for which the sequence(s) is/are given at the end of theapplication.

The coding portion of transcript HSSTROL3_T8 (SEQ ID NO:126) is shown inbold; this coding portion starts at position 24 and ends at position1169. The transcript also has the following SNPs as listed in Table 1042(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinHSSTROL3_P5 (SEQ ID NO:1395) sequence provides support for the deducedsequence of this variant protein according to the present invention).

TABLE 1042 Nucleic acid SNPs SNP position on nucleotide sequenceAlternative nucleic acid Previously known SNP? 136 T -> C Yes 334 G -> CYes 663 G -> No 699 -> T No 992 G -> C Yes 1903 C -> No 1955 C -> A Yes1981 A -> T Yes 2198 -> G No 2207 -> G No 2221 C -> No 2240 -> A No 2266T -> C No 2324 A -> No 2324 A -> C No 2474 A -> No 2474 A -> C No 2504 T-> C Yes 2651 -> T No

The coding portion of transcript HSSTROL3_T9 (SEQ ID NO:127) is shown inbold; this coding portion starts at position 24 and ends at position1169. The transcript also has the following SNPs as listed in Table 1043(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinHSSTROL3_P5 (SEQ ID NO:1395) sequence provides support for the deducedsequence of this variant protein according to the present invention).

TABLE 1043 Nucleic acid SNPs SNP position on nucleotide sequenceAlternative nucleic acid Previously known SNP? 136 T -> C Yes 334 G -> CYes 663 G -> No 699 -> T No 992 G -> C Yes 1666 A -> G Yes 1848 A -> GYes 2389 A -> G Yes 2530 C -> No 2582 C -> A Yes 2608 A -> T Yes 2825 ->G No 2834 -> G No 2848 C -> No 2867 -> A No 2893 T -> C No 2951 A -> No2951 A -> C No 3101 A -> No 3101 A -> C No 3131 T -> C Yes 3278 -> T No

Variant protein HSSTROL3_P7 (SEQ ID NO:1396) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) HSSTROL3_T10 (SEQ IDNO:128). An alignment is given to the known protein (Stromelysin-3precursor (SEQ ID NO:1455)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between HSSTROL3_P7 (SEQ ID NO:1396) and MM11_HUMAN(SEQ ID NO:1455) 1. An isolated chimeric polypeptide encoding forHSSTROL3_P7 (SEQ ID NO:1396), comprising a first amino acid sequencebeing at least 90% homologous toMAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQPWHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVLSGGRWEKTDLTYRILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHEGRADIMIDFARYW corresponding to amino acids 1-163 ofMM11_HUMAN (SEQ ID NO:1455), which also corresponds to amino acids 1-163of HSSTROL3_P7 (SEQ ID NO:1396), a bridging amino acid H correspondingto amino acid 164 of HSSTROL3_P7 (SEQ ID NO:1396), a second amino acidsequence being at least 90% homologous toGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWTIGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDCRGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLEPDAPPDACEASFDAVSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGLPSPVDAAFEDAQGHIWFFQG corresponding toamino acids 165-359 of MM11_HUMAN (SEQ ID NO:1455), which alsocorresponds to amino acids 165-359 of HSSTROL3_P7 (SEQ ID NO:1396), anda third amino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceTTGVSTPAPGV (SEQ ID NO: 253) corresponding to amino acids 360-370 ofHSSTROL3_P7 (SEQ ID NO:1396), wherein said first amino acid sequence,bridging amino acid, second amino acid sequence and third amino acidsequence are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of HSSTROL3_P7 (SEQ IDNO:1396), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence TTGVSTPAPGV (SEQ ID NO: 253) in HSSTROL3_P7 (SEQ ID NO:1396).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein HSSTROL3_P7 (SEQ ID NO:1396) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table1044, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein HSSTROL3_P7 (SEQ ID NO:1396) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 1044 Amino acid mutations SNP position(s) on amino acid sequenceAlternative amino acid(s) Previously known SNP? 38 V -> A Yes 104 R -> PYes 214 A -> No 323 Q -> H Yes

Variant protein HSSTROL3_P7 (SEQ ID NO:1396) is encoded by the followingtranscript(s): HSSTROL3_T10 (SEQ ID NO:128), for which the sequence(s)is/are given at the end of the application. The coding portion oftranscript HSSTROL3_T10 (SEQ ID NO:128) is shown in bold; this codingportion starts at position 24 and ends at position 1133. The transcriptalso has the following SNPs as listed in Table 1045 (given according totheir position on the nucleotide sequence, with the alternative nucleicacid listed; the last column indicates whether the SNP is known or not;the presence of known SNPs in variant protein HSSTROL3_P7 (SEQ IDNO:1396) sequence provides support for the deduced sequence of thisvariant protein according to the present invention).

TABLE 1045 Nucleic acid SNPs SNP position on nucleotide sequenceAlternative nucleic acid Previously known SNP? 136 T -> C Yes 334 G -> CYes 663 G -> No 699 -> T No 992 G -> C Yes 1386 A -> G Yes 1568 A -> GYes 2109 A -> G Yes 2250 C -> No 2302 C -> A Yes 2328 A -> T Yes 2545 ->G No 2554 -> G No 2568 C -> No 2587 -> A No 2613 T -> C No 2671 A -> No2671 A -> C No 2821 A -> No 2821 A -> C No 2851 T -> C Yes 2998 -> T No

Variant protein HSSTROL3_P8 (SEQ ID NO:1397) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) HSSTROL3_T11 (SEQ IDNO:129). An alignment is given to the known protein (Stromelysin-3precursor (SEQ ID NO:1455)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between HSSTROL3_P8 (SEQ ID NO:1397) and MM11_HUMAN(SEQ ID NO:1455) 1. An isolated chimeric polypeptide encoding forHSSTROL3_P8 (SEQ ID NO:1397), comprising a first amino acid sequencebeing at least 90% homologous toMAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQPWHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQKRFVLSGGRWEKTDLTYRILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHEGRADIMIDFARYW corresponding to amino acids 1-163 ofMM11_HUMAN (SEQ ID NO:1455), which also corresponds to amino acids 1-163of HSSTROL3_P8 (SEQ ID NO:1397), a bridging amino acid H correspondingto amino acid 164 of HSSTROL3_P8 (SEQ ID NO:1397), a second amino acidsequence being at least 90% homologous toGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWTIGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDCRGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLE corresponding toamino acids 165-286 of MM₁₁_HUMAN (SEQ ID NO:1455), which alsocorresponds to amino acids 165-286 of HSSTROL3_P8 (SEQ ID NO:1397), anda third amino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceVRPCLPVPLLLCWPL (SEQ ID NO: 254) corresponding to amino acids 287-301 ofHSSTROL3_P8 (SEQ ID NO:1397), wherein said first amino acid sequence,bridging amino acid, second amino acid sequence and third amino acidsequence are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of HSSTROL3_P8 (SEQ IDNO:1397), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence VRPCLPVPLLLCWPL (SEQ ID NO: 254) in HSSTROL3_P8 (SEQ IDNO:1397).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein HSSTROL3_P8 (SEQ ID NO:1397) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table1046, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein HSSTROL3_P8 (SEQ ID NO:1397) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 1046 Amino acid mutations SNP position(s) on amino acid sequenceAlternative amino acid(s) Previously known SNP? 38 V -> A Yes 104 R -> PYes 214 A -> No

Variant protein HSSTROL3_P8 (SEQ ID NO:1397) is encoded by the followingtranscript(s): HSSTROL3_T11 (SEQ ID NO:129), for which the sequence(s)is/are given at the end of the application. The coding portion oftranscript HSSTROL3_T11 (SEQ ID NO:129) is shown in bold; this codingportion starts at position 24 and ends at position 926. The transcriptalso has the following SNPs as listed in Table 1047 (given according totheir position on the nucleotide sequence, with the alternative nucleicacid listed; the last column indicates whether the SNP is known or not;the presence of known SNPs in variant protein HSSTROL3_P8 (SEQ IDNO:1397) sequence provides support for the deduced sequence of thisvariant protein according to the present invention).

TABLE 1047 Nucleic acid SNPs SNP position on nucleotide sequenceAlternative nucleic acid Previously known SNP? 136 T -> C Yes 334 G -> CYes 663 G -> No 699 -> T No 935 G -> A Yes 948 G -> A Yes 1084 G -> CYes 1557 C -> No 1609 C -> A Yes 1635 A -> T Yes 1852 -> G No 1861 -> GNo 1875 C -> No 1894 -> A No 1920 T -> C No 1978 A -> No 1978 A -> C No2128 A -> No 2128 A -> C No 2158 T -> C Yes 2305 -> T No

Variant protein HSSTROL3_P9 (SEQ ID NO:1398) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) HSSTROL3_T12 (SEQ IDNO:130). An alignment is given to the known protein (Stromelysin-3precursor (SEQ ID NO:1455)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between HSSTROL3_P9 (SEQ ID NO:1398) and MM11_HUMAN(SEQ ID NO:1455) 1. An isolated chimeric polypeptide encoding forHSSTROL3_P9 (SEQ ID NO:1398), comprising a first amino acid sequencebeing at least 90% homologous toMAPAAWLRSAAARALLPPMLLLLLQPPPLLARALPPDVHHLHAERRGPQPWHAALPSSPAPAPATQEAPRPASSLRPPRCGVPDPSDGLSARNRQK corresponding to amino acids 1-96 ofMM11_HUMAN (SEQ ID NO:1455), which also corresponds to amino acids 1-96of HSSTROL3_P9 (SEQ ID NO:1398), a second amino acid sequence being atleast 90% homologous toRILRFPWQLVQEQVRQTMAEALKVWSDVTPLTFTEVHEGRADIMIDFARYW corresponding toamino acids 113-163 of MM11_HUMAN (SEQ ID NO:1455), which alsocorresponds to amino acids 97-147 of HSSTROL3_P9 (SEQ ID NO:1398), abridging amino acid H corresponding to amino acid 148 of HSSTROL3_P9(SEQ ID NO:1398), a third amino acid sequence being at least 90%homologous toGDDLPFDGPGGILAHAFFPKTHREGDVHFDYDETWTIGDDQGTDLLQVAAHEFGHVLGLQHTTAAKALMSAFYTFRYPLSLSPDDCRGVQHLYGQPWPTVTSRTPALGPQAGIDTNEIAPLEPDAPPDACEASFDAVSTIRGELFFFKAGFVWRLRGGQLQPGYPALASRHWQGLPSPVDAAFEDAQGHIWFFQG corresponding toamino acids 165-359 of MM11_HUMAN (SEQ ID NO:1455), which alsocorresponds to amino acids 149-343 of HSSTROL3_P9 (SEQ ID NO:1398), anda fourth amino acid sequence being at least 70%, optionally at least80%, preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceTTGVSTPAPGV (SEQ ID NO: 253) corresponding to amino acids 344-354 ofHSSTROL3_P9 (SEQ ID NO:1398), wherein said first amino acid sequence,second amino acid sequence, bridging amino acid, third amino acidsequence and fourth amino acid sequence are contiguous and in asequential order.

2. An isolated chimeric polypeptide encoding for an edge portion ofHSSTROL3_P9 (SEQ ID NO:1398), comprising a polypeptide having a length“n”, wherein n is at least about 10 amino acids in length, optionally atleast about 20 amino acids in length, preferably at least about 30 aminoacids in length, more preferably at least about 40 amino acids in lengthand most preferably at least about 50 amino acids in length, wherein atleast two amino acids comprise KR, having a structure as follows: asequence starting from any of amino acid numbers 96−x to 96; and endingat any of amino acid numbers 97+((n−2)−x), in which x varies from 0 ton−2.

3. An isolated polypeptide encoding for a tail of HSSTROL3_P9 (SEQ IDNO:1398), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence TTGVSTPAPGV (SEQ ID NO: 253) in HSSTROL3_P9 (SEQ ID NO:1398).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein HSSTROL3_P9 (SEQ ID NO:1398) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table1048, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein HSSTROL3_P9 (SEQ ID NO:1398) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 1048 Amino acid mutations SNP position(s) on amino acid sequenceAlternative amino acid(s) Previously known SNP? 38 V -> A Yes 198 A ->No 307 Q -> H Yes

Variant protein HSSTROL3_P9 (SEQ ID NO:1398) is encoded by the followingtranscript(s): HSSTROL3_T12 (SEQ ID NO:130), for which the sequence(s)is/are given at the end of the application. The coding portion oftranscript HSSTROL3_T12 (SEQ ID NO:130) is shown in bold; this codingportion starts at position 24 and ends at position 1085. The transcriptalso has the following SNPs as listed in Table 1049 (given according totheir position on the nucleotide sequence, with the alternative nucleicacid listed; the last column indicates whether the SNP is known or not;the presence of known SNPs in variant protein HSSTROL3_P9 (SEQ IDNO:1398) sequence provides support for the deduced sequence of thisvariant protein according to the present invention).

TABLE 1049 Nucleic acid SNPs SNP position on nucleotide sequenceAlternative nucleic acid Previously known SNP? 136 T -> C Yes 615 G ->No 651 -> T No 944 G -> C Yes 1275 C -> No 1327 C -> A Yes 1353 A -> TYes 1570 -> G No 1579 -> G No 1593 C -> No 1612 -> A No 1638 T -> C No1696 A -> No 1696 A -> C No 1846 A -> No 1846 A -> C No 1876 T -> C Yes2023 -> T No

As noted above, cluster HSSTROL3 features 16 segment(s), which werelisted in Table 1035 above and for which the sequence(s) are given atthe end of the application. These segment(s) are portions of nucleicacid sequence(s) which are described herein separately because they areof particular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster HSSTROL3_node_(—)6 (SEQ ID NO:887) according to thepresent invention is supported by 14 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HSSTROL3_T5 (SEQ ID NO:125), HSSTROL3_T8 (SEQID NO:126), HSSTROL3_T9 (SEQ ID NO:127), HSSTROL3_T10 (SEQ ID NO:128),HSSTROL3_T11 (SEQ ID NO:129) and HSSTROL3_T12 (SEQ ID NO:130). Table1050 below describes the starting and ending position of this segment oneach transcript.

TABLE 1050 Segment location on transcripts Segment Segment Transcriptname starting position ending position HSSTROL3_T5 (SEQ ID NO: 125) 1131 HSSTROL3_T8 (SEQ ID NO: 126) 1 131 HSSTROL3_T9 (SEQ ID NO: 127) 1131 HSSTROL3_T10 (SEQ ID NO: 128) 1 131 HSSTROL3_T11 (SEQ ID NO: 129) 1131 HSSTROL3_T12 (SEQ ID NO: 130) 1 131

Segment cluster HSSTROL3_node_(—)10 (SEQ ID NO:888) according to thepresent invention is supported by 21 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HSSTROL3_T5 (SEQ ID NO:125), HSSTROL3_T8 (SEQID NO:126), HSSTROL3_T9 (SEQ ID NO:127), HSSTROL3_T10 (SEQ ID NO:128),HSSTROL3_T11 (SEQ ID NO:129) and HSSTROL3_T12 (SEQ ID NO:130). Table1051 below describes the starting and ending position of this segment oneach transcript.

TABLE 1051 Segment location on transcripts Segment Segment Transcriptname starting position ending position HSSTROL3_T5 (SEQ ID NO: 125) 132313 HSSTROL3_T8 (SEQ ID NO: 126) 132 313 HSSTROL3_T9 (SEQ ID NO: 127)132 313 HSSTROL3_T10 (SEQ ID NO: 128) 132 313 HSSTROL3_T11 (SEQ ID NO:129) 132 313 HSSTROL3_T12 (SEQ ID NO: 130) 132 313

Segment cluster HSSTROL3_node_(—)13 (SEQ ID NO:889) according to thepresent invention is supported by 36 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HSSTROL3_T5 (SEQ ID NO:125), HSSTROL3_T8 (SEQID NO:126), HSSTROL3_T9 (SEQ ID NO:127), HSSTROL3_T10 (SEQ ID NO:128),HSSTROL3_T11 (SEQ ID NO:129) and HSSTROL3_T12 (SEQ ID NO:130). Table1052 below describes the starting and ending position of this segment oneach transcript.

TABLE 1052 Segment location on transcripts Segment Segment Transcriptname starting position ending position HSSTROL3_T5 (SEQ ID NO: 125) 362505 HSSTROL3_T8 (SEQ ID NO: 126) 362 505 HSSTROL3_T9 (SEQ ID NO: 127)362 505 HSSTROL3_T10 (SEQ ID NO: 128) 362 505 HSSTROL3_T11 (SEQ ID NO:129) 362 505 HSSTROL3_T12 (SEQ ID NO: 130) 314 457

Segment cluster HSSTROL3_node_(—)15 (SEQ ID NO:890) according to thepresent invention is supported by 47 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HSSTROL3_T5 (SEQ ID NO:125), HSSTROL3_T8 (SEQID NO:126), HSSTROL3_T9 (SEQ ID NO:127), HSSTROL3_T10 (SEQ ID NO:128),HSSTROL3_T11 (SEQ ID NO:129) and HSSTROL3_T12 (SEQ ID NO:130). Table1053 below describes the starting and ending position of this segment oneach transcript.

TABLE 1053 Segment location on transcripts Segment Segment Transcriptname starting position ending position HSSTROL3_T5 (SEQ ID NO: 125) 506639 HSSTROL3_T8 (SEQ ID NO: 126) 506 639 HSSTROL3_T9 (SEQ ID NO: 127)506 639 HSSTROL3_T10 (SEQ ID NO: 128) 506 639 HSSTROL3_T11 (SEQ ID NO:129) 506 639 HSSTROL3_T12 (SEQ ID NO: 130) 458 591

Segment cluster HSSTRO3_node_(—)19 (SED ID NO:891) according to mepresent invention is supported by 63 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HSSTROL3_T5 (SEQ ID NO:125), HSSTROL3_T8 (SEQID NO:126), HSSTROL3_T9 (SEQ ID NO:127), HSSTROL3_T10 (SEQ ID NO:128),HSSTROL3_T11 (SEQ ID NO:129) and HSSTROL3_T12 (SEQ ID NO:130). Table1054 below describes the starting and ending position of this segment oneach transcript.

TABLE 1054 Segment location on transcripts Segment Segment Transcriptname starting position ending position HSSTROL3_T5 (SEQ ID NO: 125) 699881 HSSTROL3_T8 (SEQ ID NO: 126) 699 881 HSSTROL3_T9 (SEQ ID NO: 127)699 881 HSSTROL3_T10 (SEQ ID NO: 128) 699 881 HSSTROL3_T11 (SEQ ID NO:129) 699 881 HSSTROL3_T12 (SEQ ID NO: 130) 651 833

Segment cluster HSSTROL3_node_(—)21 (SEQ ID NO:892) according to thepresent invention is supported by 61 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HSSTROL3_T5 (SEQ ID NO:125), HSSTROL3_T8 (SEQID NO:126), HSSTROL3_T9 (SEQ ID NO:127), HSSTROL3_T10 (SEQ ID NO:128),HSSTROL3_T11 (SEQ ID NO:129) and HSSTROL3_T12 (SEQ ID NO:130). Table1055 below describes the starting and ending position of this segment oneach transcript.

TABLE 1055 Segment location on transcripts Segment Segment Transcriptname starting position ending position HSSTROL3_T5 (SEQ ID NO: 125) 8821098 HSSTROL3_T8 (SEQ ID NO: 126) 882 1098 HSSTROL3_T9 (SEQ ID NO: 127)882 1098 HSSTROL3_T10 (SEQ ID NO: 128) 882 1098 HSSTROL3_T11 (SEQ ID NO:129) 974 1190 HSSTROL3_T12 (SEQ ID NO: 130) 834 1050

Segment cluster HSSTROL3_node_(—)24 (SEQ ID NO:893) according to thepresent invention is supported by 7 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HSSTROL3_T8 (SEQ ID NO:126) and HSSTROL3_T9(SEQ ID NO:127). Table 1056 below describes the starting and endingposition of this segment on each transcript.

TABLE 1056 Segment location on transcripts Segment Segment Transcriptname starting position ending position HSSTROL3_T8 (SEQ ID NO: 126) 10991236 HSSTROL3_T9 (SEQ ID NO: 127) 1099 1236

Segment cluster HSSTROL3_node_(—)25 (SEQ ID NO:894) according to thepresent invention is supported by 13 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HSSTROL3_T8 (SEQ ID NO:126). Table 1057 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1057 Segment location on transcripts Segment Segment Transcriptname starting position ending position HSSTROL3_T8 (SEQ ID NO: 126) 12371536

Segment cluster HSSTRO3_node_(—)26 (SEQ ID NO:895) according to thepresent invention is supported by 55 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HSSTROL3_T5 (SEQ ID NO:125), HSSTROL3_T8 (SEQID NO:126), HSSTROL3_T9 (SEQ ID NO:127) and HSSTROL3_T11 (SEQ IDNO:129). Table 1058 below describes the starting and ending position ofthis segment on each transcript.

TABLE 1058 Segment location on transcripts Segment Segment Transcriptname starting position ending position HSSTROL3_T5 (SEQ ID NO: 125) 10991240 HSSTROL3_T8 (SEQ ID NO: 126) 1537 1678 HSSTROL3_T9 (SEQ ID NO: 127)1237 1378 HSSTROL3_T11 (SEQ ID NO: 129) 1191 1332

Segment cluster HSSTROL3_node_(—)28 (SEQ ID NO:896) according to thepresent invention is supported by 10 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HSSTROL3_T5 (SEQ ID NO:125), HSSTROL3_T9 (SEQID NO:127) and HSSTROL3_T10 (SEQ ID NO:128). Table 1059 below describesthe starting and ending position of this segment on each transcript.

TABLE 1059 Segment location on transcripts Segment Segment startingending Transcript name position position HSSTROL3_T5 (SEQ ID NO: 125)1357 2283 HSSTROL3_T9 (SEQ ID NO: 127) 1495 2421 HSSTROL3_T10 (SEQ IDNO: 128) 1215 2141

Segment cluster HSSTROL3_node_(—)29 (SEQ ID NO:897) according to thepresent invention is supported by 109 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HSSTROL3_T5 (SEQ ID NO:125), HSSTROL3_T8 (SEQID NO:126), HSSTROL3_T9 (SEQ ID NO:127), HSSTROL3_T10 (SEQ ID NO:128),HSSTROL3_T11 (SEQ ID NO:129) and HSSTROL3_T12 (SEQ ID NO:130). Table1060 below describes the starting and ending position of this segment oneach transcript.

TABLE 1060 Segment location on transcripts Segment Segment Transcriptname starting position ending position HSSTROL3_T5 (SEQ ID NO: 125) 22843194 HSSTROL3_T8 (SEQ ID NO: 126) 1795 2705 HSSTROL3_T9 (SEQ ID NO: 127)2422 3332 HSSTROL3_T10 (SEQ ID NO: 128) 2142 3052 HSSTROL3_T11 (SEQ IDNO: 129) 1449 2359 HSSTROL3_T12 (SEQ ID NO: 130) 1167 2077

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster HSSTROL3_node_(—)11 (SEQ ID NO:898) according to thepresent invention is supported by 25 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HSSTROL3_T5 (SEQ ID NO:125), HSSTROL3_T8 (SEQID NO:126), HSSTROL3_T9 (SEQ ID NO:127), HSSTROL3_T10 (SEQ ID NO:128)and HSSTROL3_T11 (SEQ ID NO:129). Table 1061 below describes thestarting and ending position of this segment on each transcript.

TABLE 1061 Segment location on transcripts Segment Segment Transcriptname starting position ending position HSSTROL3_T5 (SEQ ID NO: 125) 314361 HSSTROL3_T8 (SEQ ID NO: 126) 314 361 HSSTROL3_T9 (SEQ ID NO: 127)314 361 HSSTROL3_T10 (SEQ ID NO: 128) 314 361 HSSTROL3_T11 (SEQ ID NO:129) 314 361

Segment cluster HSSTROL3_node_(—)17 (SEQ ID NO:899) according to thepresent invention is supported by 45 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HSSTROL3_T5 (SEQ ID NO:125), HSSTROL3_T8 (SEQID NO:126), HSSTROL3_T9 (SEQ ID NO:127), HSSTROL3_T10 (SEQ ID NO:128),HSSTROL3_T11 (SEQ ID NO:129) and HSSTROL3_T12 (SEQ ID NO:130). Table1062 below describes the starting and ending position of this segment oneach transcript.

TABLE 1062 Segment location on transcripts Segment Segment Transcriptname starting position ending position HSSTROL3_T5 (SEQ ID NO: 125) 640680 HSSTROL3_T8 (SEQ ID NO: 126) 640 680 HSSTROL3_T9 (SEQ ID NO: 127)640 680 HSSTROL3_T10 (SEQ ID NO: 128) 640 680 HSSTROL3_T11 (SEQ ID NO:129) 640 680 HSSTROL3_T12 (SEQ ID NO: 130) 592 632

Segment cluster HSSTROL3_node_(—)18 (SEQ ID NO:900) according to thepresent invention can be found in the following transcript(s):HSSTROL3_T5 (SEQ ID NO:125), HSSTROL3_T8 (SEQ ID NO:126), HSSTROL3_T9(SEQ ID NO:127), HSSTROL3_T10 (SEQ ID NO:128), HSSTROL3_T11 (SEQ IDNO:129) and HSSTROL3_T12 (SEQ ID NO:130). Table 1063 below describes thestarting and ending position of this segment on each transcript.

TABLE 1063 Segment location on transcripts Segment Segment Transcriptname starting position ending position HSSTROL3_T5 (SEQ ID NO: 125) 681698 HSSTROL3_T8 (SEQ ID NO: 126) 681 698 HSSTROL3_T9 (SEQ ID NO: 127)681 698 HSSTROL3_T10 (SEQ ID NO: 128) 681 698 HSSTROL3_T11 (SEQ ID NO:129) 681 698 HSSTROL3_T12 (SEQ ID NO: 130) 633 650

Segment cluster HSSTROL3_node_(—)20 (SEQ ID NO:901) according to thepresent invention is supported by 1 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HSSTROL3_T11 (SEQ ID NO:129). Table 1064 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1064 Segment location on transcripts Segment Segment Transcriptname starting position ending position HSSTROL3_T11 (SEQ ID NO: 129) 882973

Segment cluster HSSTROL3_node_(—)27 (SEQ ID NO:902) according to thepresent invention is supported by 50 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HSSTROL3_T5 (SEQ ID NO:125), HSSTROL3_T8 (SEQID NO:126), HSSTROL3_T9 (SEQ ID NO:127), HSSTROL3_T10 (SEQ ID NO:128),HSSTROL3_T11 (SEQ ID NO:129) and HSSTROL3_T12 (SEQ ID NO:130). Table1065 below describes the starting and ending position of this segment oneach transcript.

TABLE 1065 Segment location on transcripts Segment Segment Transcriptname starting position ending position HSSTROL3_T5 (SEQ ID NO: 125) 12411356 HSSTROL3_T8 (SEQ ID NO: 126) 1679 1794 HSSTROL3_T9 (SEQ ID NO: 127)1379 1494 HSSTROL3_T10 (SEQ ID NO: 128) 1099 1214 HSSTROL3_T11 (SEQ IDNO: 129) 1333 1448 HSSTROL3_T12 (SEQ ID NO: 130) 1051 1166Variant protein alignment to the previously known protein:

The data given below shows that HSSTROL3 splice variants of the presentinvention can be used as useful diagnostic agents for lung cancer. Inparticular, differential overexpression in lung cancer cells (as opposedto normal lung cells and normal tissue of other types) was demonstratedthrough determination of mRNA expression, while antibodies selective forHSSTROL3_P9 (SEQ ID NO:1398) splice variant were found to be capable ofdetecting HSSTROL3_P9 (SEQ ID NO:1398) splice variant in human serum(blood samples), further confirming the existence of HSSTROL3_P9 (SEQ IDNO:1398) splice variant protein. HSSTROL3_P9 (SEQ ID NO:1398) splicevariant protein was found consistently to be present in one serum sampletaken from a patient with a lung cancer and not in any other healthysubjects, suggesting a differential expression in serum samples derivedfrom lung cancer patients as compared to healthy individuals, therebysupporting the utility of HSSTROL3_P9 (SEQ ID NO:1398) splice variant asa diagnostic agent for lung cancer.

Expression of Stromelysin-3 Precursor HSSTROL3 Transcripts which areDetectable by Amplicon as Depicted in Sequence Same HSSTROL3 seg24 (SEQID NO: 1675) in Normal and Cancerous Lung Tissues

Expression of Stromelysin-3 precursor (EC 3.4.24.-) (Matrixmetalloproteinase-11) (MMP-11) (ST3) (SL-3) transcripts detectable by oraccording to seg24, HSSTROL3 seg24 amplicon (SEQ ID NO: 1675) andHSSTROL3 seg24F (SEQ ID NO: 1673) and HSSTROL3 seg24R (SEQ ID NO: 1674)primers was measured by real time PCR. In parallel the expression offour housekeeping genes—PBGD (GenBank Accession No. BC019323 (SEQ IDNO:1713); amplicon—PBGD-amplicon, SEQ ID NO:334), HPRT1 (GenBankAccession No. NM_(—)000194 (SEQ ID NO:1714); amplicon—HPRT1-amplicon,SEQ ID NO:1297), Ubiquitin (GenBank Accession No. BC000449 (SEQ IDNO:1711); amplicon—Ubiquitin-amplicon, SEQ ID NO:328) and SDHA (GenBankAccession No. NM_(—)004168 (SEQ ID NO:1712); amplicon—SDHA-amplicon, SEQID NO:331) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of thequantities of the housekeeping genes. The normalized quantity of each RTsample was then divided by the median of the quantities of the normalpost-mortem (PM) samples (Sample Nos. 47-50, 90-93, 96-99, Table 2“Tissue samples in testing panel”, above), to obtain a value of foldup-regulation for each sample relative to median of the normal PMsamples.

FIG. 39 is a histogram showing over expression of the above-indicatedStromelysin-3 precursor transcripts in cancerous lung samples relativeto the normal samples. Values represent the average of duplicateexperiments. Error bars indicate the minimal and maximal valuesobtained.)

As is evident from FIG. 39, the expression of Stromelysin-3 precursortranscripts detectable by the above amplicon(s) in cancer samples wassignificantly higher than in the non-cancerous samples (Sample Nos.47-50, 90-93, 96-99 Table 2, “Tissue samples in testing panel”). Notablyan over-expression of at least 5 fold was found in 13 out of 15adenocarcinoma samples, 8 out of 16 squamous cell carcinoma samples, 3out of 4 large cell carcinoma samples and in 7 out of 8 small cellcarcinoma samples.

Threshold of 5 fold overexpression was found to differentiate betweencancer and normal samples with P value of 4.04E-04 in adenocarcinoma,9.89E-02 in squamous cell carcinoma, 6.04E-02 in Large cell carcinoma,3.14E-03 in small cell carcinoma as checked by exact fisher test. Theabove values demonstrate statistical significance of the results.

Primer pairs are also optionally and preferably encompassed within thepresent invention; for example, for the above experiment, the followingprimer pair was used as a non-limiting illustrative example only of asuitable primer pair: HSSTROL3 seg24F forward primer (SEQ ID NO: 1673);and HSSTROL3 seg24R reverse primer (SEQ ID NO: 1674).

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: HSSTROL3 seg24 (SEQ IDNO: 1675).

Forward Primer (SEQ ID NO: 1673): ATTTCCATCCTCAACTGGCAGA Reverse Primer(SEQ ID NO: 1674): TGCCCTGGAACCCACG Amplicon (SEQ ID NO: 1675):ATTTCCATCCTCAACTGGCAGAGATGAGAGCCTGGAGCATTGCAGATGCCAGGGACTTCACAAATGAAGGCACAGCATGGGAAACCTGCGTGGGTTCCAG GGCA

Expression of Stromelysin-3 Precursor HSSTROL3 Transcripts which areDetectable by Amplicon as Depicted in Sequence Name HSSTROL3 seg24 (SEQID NO: 1675) in Different Normal Tissues

Expression of Stromelysin-3 precursor transcripts detectable by oraccording to HSSTROL3 seg24 amplicon (SEQ ID NO: 1675) and HSSTROL3seg24F (SEQ ID NO: 1673) and HSSTROL3 seg24R (SEQ ID NO: 1674) wasmeasured by real time PCR. In parallel the expression of fourhousekeeping genes Ubiquitin (GenBank Accession No. BC000449 (SEQ IDNO:1711); amplicon—Ubiquitin-amplicon, SEQ ID NO:328) and SDHA (GenBankAccession No. NM_(—)004168 (SEQ ID NO:1712); amplicon—SDHA-amplicon, SEQID NO:331), RPL19 (GenBank Accession No. NM_(—)000981 (SEQ ID NO:1715);RPL19 amplicon, SEQ ID NO:1630), TATA box (GenBank Accession No.NM_(—)003194 (SEQ ID NO:1716); TATA amplicon, SEQ ID NO:1633) wasmeasured similarly. For each RT sample, the expression of the aboveamplicon was normalized to the geometric mean of the quantities of thehousekeeping genes. The normalized quantity of each RT sample was thendivided by the median of the quantities of the lung samples (Sample Nos.15-17, Table 2 “Tissue samples in normal panel”, above), to obtain avalue of relative expression of each sample relative to median of thelung samples.

Forward Primer (SEQ ID NO: 1673): ATTTCCATCCTCAACTGGCAGA Reverse Primer(SEQ ID NO: 1674): TGCCCTGGAACCCACG Amplicon (SEQ ID NO: 1675):ATTTCCATCCTCAACTGGCAGAGATGAGAGCCTGGAGCATTGCAGATGCCAGGGACTTCACAAATGAAGGCACAGCATGGGAAACCTGCGTGGGTTCCAG GGCA

The results are demonstrated in FIG. 40, showing the expression ofStromelysin-3 HSSTROL3 transcripts, which are detectable by amplicon asdepicted in sequence name HSSTROL3 seg24 (SEQ ID NO: 1675), in differentnormal tissues.

Expression of Homo sapiens Matrix Metalloproteinase 11 (Stromelysin 3)(MMP11) HSSTROL3 Transcripts which are Detectable by Amplicon asDepicted in Sequence Name HSSTROL3 seg20-21 (SEQ ID NO: 1678) in Normaland Cancerous Lung Tissues

Expression of Homo sapiens matrix metalloproteinase 11 (stromelysin 3)(MMP11) transcripts detectable by or according to seg20-21, HSSTROL3seg20-21 amplicon (SEQ ID NO: 1678) and primers HSSTROL3 seg20-21F (SEQID NO: 1676) and HSSTROL3 seg20-21R (SEQ ID NO: 1677) was measured byreal time PCR. In parallel the expression of four housekeepinggenes—PBGD (GenBank Accession No. BC019323 (SEQ ID NO:1713);amplicon—PBGD-amplicon, SEQ ID NO:334), HPRT1 (GenBank Accession No.NM_(—)000194 (SEQ ID NO:1714); amplicon—HPRT1-amplicon, SEQ ID NO:1297),Ubiquitin (GenBank Accession No. BC000449 (SEQ ID NO:1711);amplicon—Ubiquitin-amplicon, SEQ ID NO:328) and SDHA (GenBank AccessionNo. NM_(—)004168 (SEQ ID NO:1712); amplicon—SDHA-amplicon, SEQ IDNO:331), was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of thequantities of the housekeeping genes. The normalized quantity of each RTsample was then divided by the median of the quantities of the normalpost-mortem (PM) samples (Sample Nos. 47-50, 90-93, 96-99, Table 2,above), to obtain a value of fold up-regulation for each sample relativeto median of the normal PM samples.

FIG. 71 is a histogram showing over expression of the above-indicatedHomo sapiens matrix metalloproteinase 11 (stromelysin 3) (MMP11)transcripts in cancerous lung samples relative to the normal samples.

As is evident from FIG. 71, the expression of Homo sapiens matrixmetalloproteinase 11 (stromelysin 3) (MMP11) transcripts detectable bythe above amplicon(s) in cancer samples was significantly higher than inthe non-cancerous samples (Sample Nos. 46-50, 90-93, 96-99 Table 2,).Notably an over-expression of at least 6 fold was found in 11 out of 15adenocarcinoma samples, 6 out of 16 squamous cell carcinoma samples, 1out of 4 large cell carcinoma samples and in 6 out of 8 small cellscarcinoma samples.

Primer pairs are also optionally and preferably encompassed within thepresent invention; for example, for the above experiment, the followingprimer pair was used as a non-limiting illustrative example only of asuitable primer pair: HSSTROL3 seg20-21F forward primer (SEQ ID NO:1676); and HSSTROL3 seg20-21R reverse primer (SEQ ID NO: 1677).

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: HSSTROL3 seg20-21 (SEQID NO: 1678).

Forward primer HSSTROL3 seg20-21F (SEQ ID NO: 1676):TCTGCTGGCCACTGTGACTG Reverse primer HSSTROL3 seg20-21R (SEQ ID NO:1677): GAAGAAAAAGAGCTCGCCTCG Amplicon HSSTROL3 seg20-21 (SEQ ID NO:1678): TCTGCTGGCCACTGTGACTGCAGCATATGCCCTCAGCATGTGTCCCTCTCTCCCACCCCAGCCAGACGCCCCGCCAGATGCCTGTGAGGCCTCCTTTGACGCGGTCTCCACCATCCGAGGCGAGCTCTTTTTCTTC

Expression of Homo sapiens Matrix Metalloproteinase 11 (Stromelysin 3)(MMP11) HSSTROL3 Transcripts which are Detectable by Amplicon asDepicted in Sequence Name HSSTROL3 junc21-27 (SEQ ID NO: 1681) in Normaland Cancerous Lung Tissues

Expression of Homo sapiens matrix metalloproteinase 11 (stromelysin 3)(MMP11) transcripts detectable by or according to junc21-27, HSSTROL3junc21-27 amplicon (SEQ ID NO: 1681) and primers HSSTROL3 junc21-27F(SEQ ID NO: 1679) and HSSTROL3 junc21-27R (SEQ ID NO: 1680) was measuredby real time PCR. In parallel the expression of four housekeepinggenes—PBGD (GenBank Accession No. BC019323 (SEQ ID NO:1713);amplicon—PBGD-amplicon, SEQ ID NO:334), HPRT1 (GenBank Accession No.NM_(—)000194 (SEQ ID NO:1714); amplicon—HPRT1-amplicon, SEQ ID NO:1297),Ubiquitin (GenBank Accession No. BC000449 (SEQ ID NO:1711);amplicon—Ubiquitin-amplicon, SEQ ID NO:328) and SDHA (GenBank AccessionNo. NM_(—)004168 (SEQ ID NO:1712); amplicon—SDHA-amplicon, SEQ IDNO:331), was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of thequantities of the housekeeping genes. The normalized quantity of each RTsample was then divided by the median of the quantities of the normalpost-mortem (PM) samples (Sample Nos. 47-50, 90-93, 96-99, Table 2,above), to obtain a value of fold up-regulation for each sample relativeto median of the normal PM samples.

FIG. 72 is a histogram showing over expression of the above-indicatedHomo sapiens matrix metalloproteinase 11 (stromelysin 3) (MMP11)transcripts in cancerous lung samples relative to the normal samples.

As is evident from FIG. 72, the expression of Homo sapiens matrixmetalloproteinase 11 (stromelysin 3) (MMP11) transcripts detectable bythe above amplicon(s) in cancer samples was significantly higher than inthe non-cancerous samples (Sample Nos. 46-50, 90-93, 96-99 Table 2,).Notably an over-expression of at least 10 fold was found in 15 out of 15adenocarcinoma samples, 13 out of 16 squamous cell carcinoma samples, 3out of 4 large cell carcinoma samples and in 5 out of 8 small cellscarcinoma samples.

Primer pairs are also optionally and preferably encompassed within thepresent invention; for example, for the above experiment, the followingprimer pair was used as a non-limiting illustrative example only of asuitable primer pair: HSSTROL3 junc21-27F forward primer (SEQ ID NO:1679); and HSSTROL3 junc21-27R reverse primer (SEQ ID NO: 1680).

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: HSSTROL3 junc21-27(SEQ ID NO: 1681).

Forward primer HSSTROL3 junc21-27F (SEQ ID NO: 1679):ACATTTGGTTCTTCCAAGGGACTAC Reverse primer HSSTROL3 junc21-27R (SEQ ID NO:1680): TCGATCTCAGAGGGCACCC Amplicon HSSTROL3 junc21-27 (SEQ ID NO:1681): ACATTTGGTTCTTCCAAGGGACTACTGGCGTTTCCACCCCAGCACCCGGCGTGTAGACAGTCCCGTGCCCCGCAGGGCCACTGACTGGAGAGGGGTGCCC TCTGAGATCGA

FIG. 72 is a histogram showing over expression of the Homo sapiensmatrix metalloproteinase 11 (stromelysin 3) (MMP11) HSSTROL3 transcriptswhich are detectable by amplicon as depicted in sequence name HSSTROL3junc21-27 (SEQ ID NO:1681) in cancerous lung samples relative to thenormal samples. The transcript encoding for HSSTROL3_T12 splice variant(SEQ ID NO:130)) was shown to be specifically differentiallyoverexpressed in lung cancer tissue samples. The junction HSSTROL3junc21-27 (SEQ ID NO:1681) between two nodes is unique to thispolynucleotide and hence shows that this protein would be predicted tobe overexpressed in lung cancer. It should be noted for the sake ofcompleteness that this junction is present also in one other sequence,HSSTROL3_T12 (SEQ ID NO:128); however, only SEQ ID NO:130 was verifiedas being expressed as a full length sequence. The full length mRNAidentical to SEQ ID NO:130 was published (after the priority date of thepresent application) in GenBank with accession number AK075448[gi:22761543].

1. HSSTROL3 P9 (SEQ ID NO:1398) Splice Variant is Detected in SerumSamples.

Antibodies were raised against peptides corresponding to HSSTROL3_P9(SEQ ID NO:1398) splice variant. Antibodies raised against HSSTROL3_P9(SEQ ID NO:1398) splice variant showed that HSSTROL3_P9 (SEQ ID NO:1398)splice variant protein was found consistently to be present in one serumsample taken from a patient with a small cell lung carcinoma and not inany other healthy subjects, suggesting a differential expression inserum samples derived from lung cancer patients as compared to healthyindividuals, thereby supporting the utility of HSSTROL3_P9 (SEQ IDNO:1398) splice variant as a diagnostic agent for lung cancer. Theexperiments were performed as described in greater detail below.

As a tool for antibody development and ELISA assay development, bothrecombinant HSSTROL3_P9 (SEQ ID NO:1398) splice variant (MMP11_(—)354)and wild type WT MMP11 (SEQ ID NO:1455) (MMP11_(—)488) proteins wereproduced. The two genes were originally cloned into mammalian vectors,and then corresponding DNA fragments were transferred from the mammalianvectors into bacterial expression vectors. The protein was produced andpurified from bacterial cells.

1.1 Cloning and Expression of HSSTROL3_P9 (SEQ ID NO:1398) and WT MMP11(SEQ ID NO:1455).

1.1.1 Cloning of HSSTROL3 P9 (SEQ ID NO:1398) and WT MMP11 (SEQ IDNO:1455)

The following sequences were codon optimized to boost protein expressionin mammalian system: the active domain of WT MMP11 (SEQ ID NO:1455)(amino acids 114-end, (SEQ ID NO:1782)), and the active domain ofHSSTROL3_P9 (SEQ ID NO:1398) (amino acids 98-end, (SEQ ID NO:1783)). Inaddition, bacterial low usage codons were eliminated to enable bacterialexpression of the variants using the same sequences.

The optimized genes were synthesized by GeneArt (Germany) by using theirproprietary gene synthesis technology with the addition of DNA sequencesencoding the His-tag downstream to the ectopic IL6 signal peptide. TheHis tag protein sequence was added in order to allow an easierpurification of the expressed proteins can. The resultant DNA sequencesof HSSTROL3_P9 (SEQ ID NO: 1783) (MMP11_(—)354) and WT MMP11 (SEQ ID NO:1782) (MMP11_(—)488) including the tag sequence are shown in FIG. 86;while the amino acid sequences are shown in FIG. 87 (SEQ ID NO: 1785 andSEQ ID NO: 1784, respectively). The DNA fragments were cloned intoEcoRI/NotI sites of pIRESpuro3 (Clontech, cat #PT3646-5) (FIG. 88) andthe sequences were verified.

1.1.2 Bacterial Cloning and Expression of MMP11 Proteins

WT MMP11 (MMP11_(—)488) and HSSTROL3_P9 (MMP11_(—)354) inserts, encodingWT MMP11 (MMP11_(—)488) (SEQ ID NO:1786) and HSSTROL3_P9 (MMP11_(—)354)(SEQ ID NO:1787), were isolated from MMP11_(—)488 pIRESpuro3 andMMP11_(—)354 pIRESpuro3, respectively by double digestion with NcoI andNod. The sites are marked in the sequences in FIG. 86, and by arrows inFIG. 89.

The inserts were ligated to pET28 previously digested with the sameenzymes (plasmid maps and protein sequences are given in FIGS. 89 and 90respectively). The ligation mix was transformed into DH5alpha competentcells. The transformation solutions were plated on selective LB platescontaining Kanamycin. Several colonies from each transcript clone thatgrew on the selective plates were taken for further analysis byre-plating on a selective plate and by restriction enzyme analysis.

DNA from positive clones was extracted and transformed into BL21 codonplus (DE3) RIL competent cells (Stratagene Cat no. 230245). Small scaleexpression was performed following induction with 1 mM IPTG at 37° C.for 3 hrs. Expression of the recombinant proteins was detected in thewhole cell lysates, both by Coomassie staining (FIG. 91) and by Westernblot (FIG. 92) using anti-His antibodies (Serotec, Cat. #MCA1396).

1.2. Bacterial Production of HSSTROL3_P9 (SEQ ID NO:1787) and WT MMP11(SEQ ID NO:1786).

Bacterial cultures expressing WT MMP11 (SEQ ID NO:1786) and HSSTROL3_P9(SEQ ID NO:1787) (pET28, BL21+codon) were prepared as described aboveand 50 μl culture were used to start production. The cultures werepropagated over-night in 50 ml LB medium supplemented with selectionantibiotics (Kanamycin 10 ug/ml, Chloramphenicol 34 ug/ml), at 37° C.,200 rpm and then expanded to a final volume of 1 L each. After a fewhours, when the cultures reached OD₆₀₀ of 0.5-0.7, induction was carriedout with 1 mM IPTG. Following three hours after induction, upon cellsreaching a density of 1.3-1.4 OD₆₀₀, cultures were centrifuged at 6000 gfor 10 min and supernatant was discarded. Cell pellets were stored at−20° C. until purification.

1.3 Purification of MMP11_(—)354 (HSSTROL3_P9 (SEQ ID NO:1787)) andMMP11_(—)488 (WT MMP11) (SEQ ID NO:1786).

1.3.1. Purification of HSSTROL3_P9 (MMP11 354) (SEQ ID NO:1787)

The bacterial cell pellet of 1 liter culture expressing HSSTROL3_P9(MMP11_(—)354, (SEQ ID NO:1787)) prepared as described above, wasre-suspended in 50 ml of lysis buffer (50 mM Tris pH 7.5, 100 mM KCl,0.5% triton x100, 0.1 mg/ml lysozyme) and incubated for 1 hour at roomtemperature. The cells were further disrupted by sonication on ice(Misonix XL2020, microtip). The inclusion bodies were collected bycentrifugation and washed 3 times with 30 ml wash buffer 1 (50 mM trispH 7.5, 2M NaCl 0.5% triton) and then twice with 30 ml wash buffer 2 (50mM tris pH 7.5), by re-suspension and centrifugation as described above.

Washed inclusion bodies were resuspended in 1/20 original culture volumeof 8M urea buffer (8M urea, 50 mM tris, 10 mM DTT, pH 8.5) and incubatedfor 2 hours at RT. The dissolved inclusion bodies were diluted ×10 inbinding buffer (8M urea, 50 mM tris, 300 mM NaCl 20 mM imidazole) andincubated for 17 hours at 37° C. with Ni-NTA Superflow beads (Ni-NTASuperflow®, IBA) that were equilibrated with 5 column volumes (CV) ofWFI followed by 10 CV of binding buffer with 1 mM DTT. The beads werepacked in XK16 column and washed with binding buffer containing 1 mM DU.The bound protein was eluted with elution buffer (8M urea, 50 mM Tris,0.3M NaCl, 1 mM DTT, 0.25M imidazole, pH 8.0).

The eluted protein was diluted x8.3 with binding buffer+1 mM DTT andrefolded gradually by dialysis against buffer containing decreasing ureaconcentrations in 50 mM tris pH 8.5, 100 mM NaCl, 10 mM CaCl₂ and 100 μMZnCl₂. The final buffer pH was adjusted to 7.4.

After dialysis the refolded protein was filtered through 0.22 μm filterand concentrated ×5 on 10,000 MWCO membrane (Amicon, Cat#PBGC06210). Theconcentrated protein was centrifuged to eliminate aggregates.

A sample of the purified protein was analyzed by SDS-PAGE stained byCoomassie (not shown). The identity of the proteins was verified byLC-MS/MS.

1.3.2. Purification of WT MMP11(MMP11 488, (SEQ ID NO:1786))

The bacterial cell pellet of 1 liter culture expressing WT MMP11(MMP11_(—)488) (SEQ ID NO:1786) prepared, as described above, wasre-suspended in 50 ml of lysis buffer (50 mM Tris pH 7.5, 100 mM KCl,0.5% triton x100, 0.1 mg/ml lysozyme) and incubated for 1 hour at roomtemperature. The cells were further disrupted by sonication on ice(Misonix XL2020, microtip). The inclusion bodies were collected bycentrifugation and washed 3 times with 30 ml wash buffer 1 (50 mM trispH 7.5, 2M NaCl 0.5% triton) and then twice with 30 ml wash buffer 2 (50mM tris pH 7.5), by re-suspension and centrifugation as described above.

Washed inclusion bodies were resuspended in 1/20 original culture volumeof 8M urea buffer (8M urea, 50 mM tris, 10 mM DTT, pH 8.5) and incubatedfor 2 hours at RT. The dissolved inclusion bodies were diluted 10× inbinding buffer (8M urea, 50 mM tris, 300 mM NaCl 20 mM imidazole) andincubated for 17 hours at 37° C. with Ni-NTA Superflow beads (Ni-NTASuperflow®, IBA) that were equilibrated with 5 column volumes (CV) ofWFI followed by 10 CV of binding buffer with 1 mM DTT. The beads werepacked in XK16 column and washed with binding buffer containing 1 mMDTT. The bound protein was eluted with elution buffer (8M urea, 50 mMTris, 0.3M NaCl, 1 mM DTT, 0.25M imidazole, pH 8.0).

The eluted protein was treated with 10 mM DTI for 30 min at roomtemperature and then diluted gradualy ×8 with dilution buffer (0.5Marginine, 50 mM tris pH 8.5, 100 mM NaCl, 5 mM CaCl₂, 1 μM ZnCl₂, 5%glycerol, 0.5% Tween 20, 1 mM DTT pH 9). Following dialysis agianst thedilution buffer the protein was dialysed against the final buffercontaining 50 mM Tris pH 7.4, 100 mM NaCl, 5 mM CaCl₂, 1 μM ZnCl₂, 1 mMDTT.

After dialysis the refolded protein was filtered through 0.22 um filter,concentrated ×3-5 on 10,000 MWCO membrane and the concentrated proteinwas centrifuged to eliminate aggregates. A sample of the purifiedprotein was analyzed by SDS-PAGE stained by Coomassie (not shown). Theidentity of the proteins was verified by LC-MS/MS.

2 Antibody Development

In order to test HSSTROL3_P9 (SEQ ID NO:1398) protein expression patternin serum samples of diseased and healthy individuals, both monoclonaland polyclonal antibodies were developed that had sufficient bindingspecificity to permit the specific analysis of this protein.

The antibody of interest had to recognize HSSTROL3_P9 (SEQ ID NO:1398)without recognizing WT MMP11 (SEQ ID NO:1455). Therefore, serum titersas well as resultant antibodies were tested against both proteinpreparations following a successful recognition of the immunogen.

2.1 Peptide Design and Synthesis

One peptide was selected as immunogen for monoclonal and polyclonalantibody development for the unique splice variant. The peptide sequenceof HSSTROL3_P9 (SEQ ID NO:1398) unique tail was used as a template.

Selected immunogen: The primary sequence of the immunogen peptide(CGEN6301, SEQ ID NO: 1781) is shown below. The terminal cysteineresidue was used to facilitate coupling viam-maleimidobenzoyl-N-hydroxysuccinimide ester (MBS) to KLH. Ahx standfor a 6-aminohexanoic acid. Peptide CGEN6301 (SEQ ID NO: 1781):CKK-Ahx-FFQGTTGVSTPAPGV

The peptide represents the C terminus of the protein; therefore theC-terminus of the immunogen was left unblocked. The peptide immunogenindicated above is overlaied on the primary sequence of the protein (SEQID NO: 1398) is shown in FIG. 93.

The immunogen peptide was synthesized using a conventional technology(50 mg; purity≧90%). The peptide was conjugated to Keyhole LimpetHemocyanin (KLH) and Bovine Serum Albumin (BSA) using anm-maleimidobenzoyl-N-hydroxysuccinimide ester (MBS) linker.

2.2 Rabbit Polyclonal Antibody Development 2.2.1. Rabbit Immunizationand Sera Testing

Three New Zealand White Rabbits (referred to herein by number as 8350,8351 and 8352) were immunized with CGEN6301 conjugated with KLH.Immunization schedule and production bleed schedules are summarized inTables 1066 and 1067, respectively.

TABLE 1066 Summary of rabbit immunization and test bleed schedule.Scheduled Date Initial Injection Boost #1 Boost #2 Boost #3 (500 μg (250μg (250 μg (250 μg Rabbit # Pre Bleed ID/CFA) ID/IFA) SC/IFA) SC/IFA)Test Bleed #1 8350 Jun. 12, 2006 Jun. 16, 2006 Jun. 23, 2006 Jun. 30,2006 Jul. 14, 2006 Jul. 24, 2006 8351 Jun. 12, 2006 Jun. 16, 2006 Jun.23, 2006 Jun. 30, 2006 Jul. 14, 2006 Jul. 24, 2006 8352 Jun. 12, 2006Jun. 16, 2006 Jun. 23, 2006 Jun. 30, 2006 Jul. 14, 2006 * * Rabbit 8352expired on Jul. 21, 2006

TABLE 1067 Summary of rabbit production bleed schedule. Scheduled DateRabbit Production Production Production Production Production ProductionProduction Production Terminal # Bleed #1 Bleed #2 Bleed #3 Bleed #4Bleed #5 Bleed #6 Bleed #7 Bleed #8 Bleed 8350 Aug. 3, 2006 Aug. 14,2006 Aug. 21, 2006 Sep. 4, 2006 Sep. 11, 2006 Sep. 18, 2006 Sep. 25,2006 Oct. 2, 2006 Nov. 6, 2006 8351 Aug. 3, 2006 Aug. 14, 2006 Aug. 21,2006 Sep. 4, 2006 Sep. 11, 2006 Sep. 18, 2006 Sep. 25, 2006 Oct. 2, 2006Nov. 6, 2006

Production bleeds were collected and antibody titers were determined byELISA using CGEN6301 peptide conjugated to BSA, recombinant HSSTROL3_P9(SEQ ID NO:1787) splice variant and WT MMP11 (SEQ ID NO:1786) (notshown). Rabbit 8352 expired on Jul. 21, 2006 therefore; no test bleedand no production bleeds were collected from this rabbit.

2.2.2 Rabbit Polyclonal Antibody Affinity Purification

Affinity purification was performed on all production bleeds collectedfrom the two rabbits (8350 and 8351) using a CGEN6301 immunoaffinityresin. Two passes of PBS diluted antiserum (1:1) were run onimmunoaffinity resin prepared by coupling 10 mg of the CGEN6301 peptideto agarose beads. The purified product was concentrated to approximately1 mg/ml and dialyzed against 1×PBS. The yield obtained from thesepurifications is summarized in Table 1068 below.

TABLE 1068 Affinity purified antibody yield Lot Number RabbitConcentration Volume Total Yield Buffer 18976C 8350 1.20 mg/ml 45.0 ml54.2 mg 0.02 M Potassium Phosphate, 0.15 M Sodium Chloride, pH 7.2,18977C 8351 1.15 mg/ml 75.0 ml 86.3 mg 0.02 M Potassium Phosphate, 0.15M Sodium Chloride, pH 7.2,

Purified antibodies were assayed by ELISA for reactivity towards theimmunogen conjugated to BSA, recombinant splice variant protein and wildtype protein. Results are summarized in FIG. 94. These two antibodypreparations showed a good recognition of HSSTROL3_P9 ((SEQ ID NO:1787)and low recognition of WT MMP11 (SEQ ID NO:1786). Therefore, both lotswere used for Assay Development.

Reactivity of the purified antibodies to both the splice variant and thewild type proteins was also tested by a Western blot analysis. Theresults showed good recognition of HSSTROL3_P9 (SEQ ID NO:1787) splicevariant and no recognition of the WT MMP11 (SEQ ID NO:1786) protein (seeFIGS. 95 and 96).

2.3. Mouse Monoclonal Antibody Development

2.3.1 Mouse Immunization and Sera Testing

Twenty Balb/c mice were immunized with CGEN6301 conjugated to KLH.Immunization and bleeding schedules are summarized in Table 1069.

TABLE 1069 Summary of Mouse Immunizations, Test Bleeds and FinalBoosting Schedules. Scheduled Date Initial Injection Boost #1 Boost #2(100 ug IP/ (50 ug IP/ Test Bleed (50 ug Test Boost #3 Test Bleed FinalBoost Peptide # Pre-Bleed CFA) IFA) #1 IP/IFA) Bleed #2 (50 ug IP/IFA)#3 (50 ug IP) CGEN6301 Jun. 20, Jun. 22, 2006 Jul. 06, 2006 Jul. 17,2006 Jul. 27, 2006 Aug. 7, 2006 Sep. 19, 2006 Sep. 29, 2006 Aug. 25,2006 2006 Aug. 26, 2006 Oct. 09, 2006Test bleeds were collected and antibody titers were determined by ELISAusing CGEN6301 peptide conjugated to BSA, HSSTROL3_P9 (SEQ ID NO:1787)splice variant and WT MMP11 protein (SEQ ID NO:1786) (data not shown).

Out of twenty mice immunized with CGEN6301 peptide, 6 showed highantibody titers to HSSTROL3_P9 (SEQ ID NO:1787) splice variant andlimited recognition of the WT MMP11 protein (SEQ ID NO:1786). These wereselected for hybridoma production.

2.3.2. Cell Fusion and Screening

Hybridoma cell lines were developed by performing splenocyte:myelomafusions using the spleens from two mice for each fusion. Three fusionsin total, were performed using the best mice responders. The fusionpartner used was the SP2/0 Ag 14 (CRL-1581) myeloma cell line. Thesplenocytes and cell line were fused using polyethylene glycol. Thefused cells were allowed to grow for 7-10 days prior to screening. Theresulting hybridoma clones were screened by a two step strategydescribed below:

-   -   1. Primary Screening Step        -   a. Direct ELISA using Cgen6301 peptide-BSA conjugate: Only            positive reacting clones with sufficiently high titers (OD            at 450 nm >2) were carried forward.        -   b. Class and subclass determination: Positive clones were            expanded and isotyped to determine antibody class and            subclass. Only IgG class antibodies were carried forward.            The preferred order of subclass clones is:            IgG₁>IgG_(2a)>IgG_(2b)>IgG₃.

Clones that were approved by the primary screening criteria weretransferred for secondary screening.

-   -   2. Secondary Screening Step: Direct ELISA using HSSTROL3_P9 (SEQ        ID NO:1787) splice variant protein and WT MMP11 (SEQ ID NO:1786)        protein. Only clones reacting positively with the splice variant        and negatively with the wild type protein were carried forward.        Data collected during the secondary screening of post-fusion        products is summarized in Table 1070 below.

TABLE 1070 Summary of Secondary Screening Results for Post-FusionClones. Reactivity in Direct ELISA Peptide Parental Clone (OD 450 nm)Immunogen Designation Isotype(s) Peptide-BSA SVr WT CGEN6301 13E1 IgG33.404 >4.000 0.125 5A10 IgG3 3.073 >4.000 0.106 7B7 IgG3 2.794 >4.0000.088 5A8 * 1.900 3.837 0.084 12F6 * 2.101 1.789 0.100 5C6 *2.337 >4.000 0.130 7G5 * 2.149 2.725 0.099 7G11 IgG1 2.274 2.602 0.1155F6 IgG3 2.104 3.945 0.120 5D5 IgG3 2.004 >4.000 0.153 5D6 IgG32.763 >4.000 0.143 * - Mixed population-parental clones demonstratedmore then one isotype, once determined monoclonal the clones werere-isotyped

A total of 11 positive parental clones were identified for HSSTROL3_P9(SEQ ID NO:1398) project.

These were then transferred for expansion and subcloning in order toprepare monoclonal cell populations.

2.3.3. Subcloning and Colony Expansion

Up to 2 subclones per positive parental clone were obtained by limitingdilution for each of the 11 clones transferred to this stage. Allsubclones generated in this step were evaluated by a direct ELISA testwith CGEN6301 peptide-BSA conjugate, HSSTROL3_P9 (SEQ ID NO:1787) splicevariant and WT MMP11 (SEQ ID NO:1786) proteins.

Table 1071 shows reactivity of successfully subcloned parental celllines produced from splenocyte fusions of animals injected withCGEN6301. All subclones designated in table 1071 were cryopreserved forfuture long term use.

TABLE 1071 Summary of Secondary Screening Results for CGEN6301 PeptideImmunizations. Reactivity in Direct ELISA Peptide Parental CloneSubclone (OD 450 nm) Immunogen Designation Isotype(s) DesignationPeptide-BSA SVr WT CGEN6301 5A10 IgG3/kappa 5A10.HI 3.441 >4.000 0.2285A10 IgG3/kappa 5A10.H6 3.321 >4.000 0.211 13E1 IgG3/kappa 13E1.G1.F33.316 >4.000 0.143 13E1 IgG3/kappa 13E1.G1.G1 3.236 >4.000 0.159 7B7IgG3/kappa 7B7.C12.E12 2.920 >4.000 0.114 7B7 IgG3/kappa 7B7.C12.F72.968 >4.000 0.108 5D6 IgG3/kappa 5D6.E3 3.548 3.906 0.248 5D6IgG3/kappa 5D6.H4 3.502 3.929 0.290 5D5 IgG3/kappa 5D5.G1 3.613 >4.0000.295 5D5 IgG3/kappa 5D5.G7 3.502 >4.000 0.290 7G11 IgG1/kappa7G11.F6.E1 3.418 3.740 0.231 7G11 IgG1/kappa 7G11.F6.H4 3.211 3.7460.208 5F6 IgG3/kappa 5F6.E9.F4 3.528 >4.000 0.299 5F6 IgG3/kappa5F6.E9.F5 3.408 >4.000 0.322 5C6 IgG3/kappa 5C6.H5.H8.H 3.578 >4.0000.282 5C6 IgG3/kappa 5C6.H5.H8.H 3.475 >4.000 0.297

2.3.4. Monoclonal Antibody Production and Purification

Subclones demonstrating high titers to HSSTROL3_P9 (SEQ ID NO:1787) andthe immunogen peptide CGEN6301 low titers to WT MMP11 (SEQ ID NO:1786)were selected for antibody production (Table 1072).

TABLE 1072 Subclones Selected for Antibody Production. Peptide ParentalClone Subclone Immunogen Designation Isotype(s) Designation CGEN630113E1 IgG3/kappa 13E1.G1.F3 7G11 IgG1/kappa 7G11.F6.E1

Subclones listed in Table 1072 were cultured in 2,000 ml roller bottlesfor antibody production. Protein A purification was performed on 200 mlof ×10 concentrated roller bottle supernatant diluted with an equalvolume of sample buffer. A single pass was run over a Protein Asepharose column and the eluted product was dialyzed against PBS. Priorto final vialing each antibody was filter sterilized (0.22 um).

Antibody yield and concentration were determined after purificationusing conventional methods and are summarized in Table 1073.

TABLE 1073 Monoclonal Antibody Yield and Concentration. Protein YieldPeptide Subclone Concentration Volume Amount Immunogen Designation(mg/ml) (ml) (mg) Lot# Buffer CGEN6301 13E1.G1.F3 2.26 mg/ml 58 ml 131mg 18944C 0.02 M Potassium Phosphate, 0.15 M Sodium Chloride, pH7G11.F6.E1 1.37 mg/ml 68 ml 93 mg 19032C 0.02 M Potassium Phosphate,0.15 M Sodium Chloride, pH

Purified antibodies were assayed by ELISA for reactivity towardsCGEN6301 peptide (SEQ ID NO:1781) conjugated to BSA, HSSTROL3_P9 (SEQ IDNO:1787) splice variant and WT MMP11 (SEQ ID NO:1786). Results appear inFIG. 97. FIG. 97 shows that clone 13E1.G1.F3 (lot 18944C) has a higherrecognition towards HSSTROL3_P9 (SEQ ID NO:1787) splice variant andCGEN6301 (SEQ ID NO:1781) peptide than clone 7G11.F6.E1 (antibody lot19032).

3. HSSTROL3_P9 (SEQ ID NO:1398) Assay Development

Next the Assay Development stage of HSSTROL3_P9 (SEQ ID NO:1398) projectwas performed with serum samples of Non Small Cell Lung Carcinomapatients and controls (ie patients who were not suffering from lungcancer).

Four antibodies, described above, were used for assay development

-   -   Two polyclonal antibodies, (Rockland polyclonals Rabbit 8350 &        Rabbit 8351) developed against a synthetic peptide        CKK-Ahx-FFQGTTGVSTPAPGV (SEQ ID NO:1781) comprising the unique        tail of the HSSTROL3_P9 (SEQ ID NO:1398).    -   Two monoclonal antibodies (Rockland monoclonals clone 13E1.G1.F3        and clone 7G11.F6.E1) developed against a synthetic peptide        CKK-Ahx-FFQGTTGVSTPAPGV (SEQ ID NO:1781) comprising the unique        tail of HSSTROL3_P9 (SEQ ID NO:1398).        Three ELISA formats were developed in order to identify the most        sensitive assay format for the detection of HSSTROL3_P9 (SEQ ID        NO:1398) splice variant protein in serum:    -   Sandwich ELISA    -   Antibody capture competitive ELISA    -   Antigen capture competitive ELISA

3.1. Sandwich ELISA

In order to find the best sandwich pair, various combinations ofantibodies raised in different hosts were tested for their ability todetect serial dilutions of HSSTROL3_P9 (SEQ ID NO:1787) spiked in serum.Antibodies from the same host were not tested in this format.

The best sandwich assay format for the detection of HSSTROL3_P9 (SEQ IDNO:1398) was found to be: Format #1:

-   -   Coat: Mab 13E1.G1.F3    -   Detector Rabbit polyclonal (Rb 8351)    -   LOD for HSSTROL3_P9 (SEQ ID NO:1787) ˜30 ng/ml

3.2 Competitive ELISA

Two competitive assay formats were developed: antibody capture andantigen capture, and the best conditions were determined to each format.

3.2.1 Antigen Capture Competitive ELISA

ELISA plates were coated with HSSTROL3_P9 (SEQ ID NO:1787) protein andbinding to antibody pre-incubated with HSSTROL3_P9 (SEQ ID NO:1787)protein-spiked serum samples was assessed.

The best optimized antigen capture assay was: Format #2:

-   -   Coat HSSTROL3_P9 (SEQ ID NO:1787)    -   Detector Rabbit polyclonal (Rabbit 8351)    -   LOD for HSSTROL3_P9 (SEQ ID NO:1787) ˜70 ng/ml

3.2.2. Antibody Capture Competitive ELISA

ELISA plates were coated with the antibody and its binding to labeled(biotinylated) HSSTROL3_P9 (SEQ ID NO:1787) protein-spiked serum sampleswas assessed. Non-labeled HSSTROL3_P9 (SEQ ID NO:1787) protein wastested as competing antigen; mouse 13E1.G1.F3 and both rabbit antibodieswere tested as capture antibodies. The best optimized antibody captureassay format was: Format #3:

-   -   Coat Rabbit polyclonal (Rb 8351)    -   Detector HSSTROL3_P9 (SEQ ID NO:1787) biotin-labeled protein    -   LOD for HSSTROL3_P9 (SEQ ID NO:1787) ˜50 ng/ml

The sandwich ELISA (Format #1) appeared to be somewhat more sensitivethan both competitive formats (Formats #2 & 3). Therefore, this formatwas selected for screening the serum samples.

4. Serum Screening

Serum screening of 50 serum samples from Non Small Cell Lung Cancer(NSCLC) patients and 50 control sera were tested by using the abovedescribed HSSTROL3_P9 (SEQ ID NO:1398) sandwich assay.

4.1 Serum Samples Screening by Sandwich ELISA (Format #1)

The plates were coated overnight with mouse 13E1.G1.F1 antibody. Boundantigen was detected using rabbit 8351 antibodies. 50 sera from NSCLCpatients, and 50 age and gender matched control sera (ie from subjectsnot suffering from lung cancer) were tested in this ELISA format. The 50control serum samples consisted of 36 different samples plus duplicatesof 14 of them. The reference curve was prepared by diluting HSSTROL3_P9(SEQ ID NO:1787) splice variant protein into pooled normal serum.

The results showed that out of 100 samples tested in this assay only onesample (patient 1388P) was detected.

In order to verify the results observed in the first serum screening, asecond serum screen was performed using the same sandwich ELISA format.The same 50 NSCLC patients and 28 out of the 50 control sera sampleswere assayed. The results observed were very similar to those obtainedin the first serum test: the same one sample (1388P) was detected. Theresults of the two serum screens were therefore consistent.

The overall results suggest that HSSTROL3_P9 (SEQ ID NO:1398) isprobably present in serum samples from lung cancer patients, however itsconcentration is too low to be detected by this assay format.

Summary

A collection of monoclonal and polyclonal antibodies specific forHSSTROL3_P9 (SEQ ID NO:1398) splice variant was developed. Theseantibodies were used to test the potential of HSSTROL3_P9 (SEQ IDNO:1398) to become a diagnostic biomarker for Non Small Cell Lung Cancerdiagnosis. A few ELISA formats were developed using this antibodycollection for the determination of serum levels of HSSTROL3_P9 (SEQ IDNO:1398) splice variant in healthy and diseased individuals. Thesandwich ELISA format was selected to test HSSTROL3_P9 (SEQ ID NO:1398)serum levels.

It appears that this ELISA format is not sufficiently sensitive todetect expression of HSSTROL3_P9 (SEQ ID NO:1398) in most of the testedsamples. However HSSTROL3_P9 (SEQ ID NO:1398) splice variant was foundconsistently to be present in one serum sample, suggesting that it mightbe present also in other serum samples but below the detection limit.

It is likely that improving assay sensitivity by 10 fold through the useof antibodies with higher binding affinities or by the use of noveldetection technologies will allow the detection of CenMMP11 in serumsamples. A more sensitive test may reliably enable the assessment ofHSSTROL3_P9 (SEQ ID NO:1398) diagnostic potential.

Conclusions

HSSTROL3_P9 (SEQ ID NO:1398) splice variant appears to be a specificmolecular diagnostic marker for lung cancer.

Description for Cluster HUMTREFAC

Cluster HUMTREFAC features 2 transcript(s) and 7 segment(s) of interest,the names for which are given in Tables 1074 and 1075, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 1076.

TABLE 1074 Transcripts of interest Transcript Name Sequence ID No.HUMTREFAC_PEA_2_T4 131 HUMTREFAC_PEA_2_T5 132

TABLE 1075 Segments of interest Segment Name Sequence ID No.HUMTREFAC_PEA_2_node_0 903 HUMTREFAC_PEA_2_node_9 904HUMTREFAC_PEA_2_node_2 905 HUMTREFAC_PEA_2_node_3 906HUMTREFAC_PEA_2_node_4 907 HUMTREFAC_PEA_2_node_5 908HUMTREFAC_PEA_2_node_8 909

TABLE 1076 Proteins of interest Sequence Protein Name ID No.Corresponding Transcript(s) HUMTREFAC_PEA_2_P7 1399 HUMTREFAC_PEA_2_T5(SEQ ID NO: 132) HUMTREFAC_PEA_2_P8 1400 HUMTREFAC_PEA_2_T4 (SEQ ID NO:131)

These sequences are variants of the known protein Trefoil factor 3precursor (SwissProt accession identifier TFF3_HUMAN; known alsoaccording to the synonyms Intestinal trefoil factor; hP1.B), SEQ ID NO:1456, referred to herein as the previously known protein.

Protein Trefoil factor 3 precursor (SEQ ID NO:1456) is known or believedto have the following function(s): May have a role in promoting cellmigration (motogen). The sequence for protein Trefoil factor 3 precursoris given at the end of the application, as “Trefoil factor 3 precursoramino acid sequence”. Known polymorphisms for this sequence are as shownin Table 1077.

TABLE 1077 Amino acid mutations for Known Protein SNP position(s) onamino acid sequence Comment 74-76 QEA -> TRKT

Protein Trefoil factor 3 precursor (SEQ ID NO:1456) localization isbelieved to be Secreted. The following GO Annotation(s) apply to thepreviously known protein. The following annotation(s) were found:defense response; digestion, which are annotation(s) related toBiological Process; and extracellular, which are annotation(s) relatedto Cellular Component.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

Cluster HUMTREFAC can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 41 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 41 and Table 1078. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions: amixture of malignant tumors from different tissues, breast malignanttumors, pancreas carcinoma and prostate cancer.

TABLE 1078 Normal tissue distribution Name of Tissue Number adrenal 40colon 797 epithelial 95 general 39 liver 0 lung 57 lymph nodes 3 breast0 muscle 3 pancreas 2 prostate 16 stomach 0 Thyroid 257 uterus 54

TABLE 1079 P values and ratios for expression in cancerous tissue Nameof Tissue P1 P2 SP1 R3 SP2 R4 adrenal 6.4e−01 6.9e−01 7.1e−01 1.17.8e−01 0.9 colon 4.6e−01 5.7e−01 9.7e−01 0.5 1 0.4 epithelial 2.4e−023.4e−01 9.5e−10 2.0 5.3e−02 1.1 general 2.5e−04 3.9e−02 1.4e−28 3.61.9e−10 1.9 liver 1 6.8e−01 1 1.0 6.9e−01 1.4 lung 4.8e−01 7.6e−012.2e−03 1.0 1.6e−01 0.5 lymph nodes 5.1e−01 8.0e−01 2.3e−02 5.0 1.9e−012.1 breast 7.6e−02 1.2e−01 3.1e−06 12.0 1.1e−03 6.5 muscle 9.2e−014.8e−01 1 0.8 3.9e−01 2.1 pancreas 1.2e−01 2.4e−01 5.7e−03 6.5 2.1e−024.6 prostate 1.5e−01 2.7e−01 9.9e−10 8.1 3.1e−07 5.7 stomach 3.0e−011.3e−01 5.0e−01 2.0 6.7e−02 2.8 Thyroid 6.4e−01 6.4e−01 9.6e−01 0.59.6e−01 0.5 uterus 4.1e−01 7.3e−01 7.5e−02 1.3 4.0e−01 0.8

As noted above, cluster HUMTREFAC features 2 transcript(s), which werelisted in Table 1074 above. These transcript(s) encode for protein(s)which are variant(s) of protein Trefoil factor 3 precursor (SEQ IDNO:1456). A description of each variant protein according to the presentinvention is now provided.

Variant protein HUMTREFAC_PEA_(—)2_P7 (SEQ ID NO:1399) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) HUMTREFAC_PEA_(—)2_T5 (SEQID NO:132). The location of the variant protein was determined accordingto results from a number of different software programs and analyses,including analyses from SignalP and other specialized programs. Thevariant protein is believed to be located as follows with regard to thecell: secreted. The protein localization is believed to be secretedbecause both signal-peptide prediction programs predict that thisprotein has a signal peptide, and neither trans-membrane regionprediction program predicts that this protein has a trans-membraneregion.

Variant protein HUMTREFAC_PEA_(—)2_P7 (SEQ ID NO:1399) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 1080, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein HUMTREFAC_PEA_(—)2_P7 (SEQ ID NO:1399) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 1080 Amino acid mutations SNP position(s) on amino acid sequenceAlternative amino acid(s) Previously known SNP? 5 A -> S No 5 A -> T No14 A -> V Yes 43 L -> M No 60 P -> S Yes 123 S -> * Yes

Variant protein HUMTREFAC_PEA_(—)2_P7 (SEQ ID NO:1399) is encoded by thefollowing transcript(s): HUMTREFAC_PEA_(—)2_T5 (SEQ ID NO:132), forwhich the sequence(s) is/are given at the end of the application. Thecoding portion of transcript HUMTREFAC_PEA_(—)2_T5 (SEQ ID NO:132) isshown in bold; this coding portion starts at position 278 and ends atposition 688. The transcript also has the following SNPs as listed inTable 1081 (given according to their position on the nucleotidesequence, with the alternative nucleic acid listed; the last columnindicates whether the SNP is !mown or not; the presence of known SNPs invariant protein HUMTREFAC_PEA_(—)2_P7 (SEQ ID NO:1399) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 1081 Nucleic acid SNPs SNP position on nucleotide sequenceAlternative nucleic acid Previously known SNP? 233 A -> G Yes 290 G -> ANo 290 G -> T No 318 C -> T Yes 404 C -> A No 404 C -> T No 455 C -> TYes 645 C -> A Yes 685 C -> T No

Variant protein HUMTREFAC_PEA_(—)2_P8 (SEQ ID NO:1400) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) HUMTREFAC_PEA_(—)2_T4 (SEQID NO:131). An alignment is given to the known protein (Trefoil factor 3precursor (SEQ ID NO:1456)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between HUMTREFAC_PEA_(—)2_P8 (SEQ ID NO:1400) andTFF3 HUMAN (SEQ ID NO:1456):

1. An isolated chimeric polypeptide encoding for HUMTREFAC_PEA2_P8 (SEQID NO:1400), comprising a first amino acid sequence being at least 90%homologous to MAARALCMLGLVLALLSSSSAEEYVGL corresponding to amino acids1- 27 of TFF3_HUMAN (SEQ ID NO:1456), which also corresponds to aminoacids 1-27 of HUMTREFAC_PEA_(—)2_P8 (SEQ ID NO:1400), and a second aminoacid sequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence WKVHLPKGEGFSSG (SEQ IDNO: 1774) corresponding to amino acids 28-41 of HUMTREFAC_PEA_(—)2_P8(SEQ ID NO:1400), wherein said first amino acid sequence and secondamino acid sequence are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of HUMTREFAC_PEA_(—)2_P8(SEQ ID NO:1400), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence WKVHLPKGEGFSSG (SEQ ID NO: 1774) inHUMTREFAC_PEA_(—)2_P8 (SEQ ID NO:1400).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein HUMTREFAC_PEA_(—)2_P8 (SEQ ID NO:1400) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 1082, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein HUMTREFAC_PEA_(—)2_P8 (SEQ ID NO:1400) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 1082 Amino acid mutations SNP position(s) on amino acid sequenceAlternative amino acid(s) Previously known SNP? 5 A -> S No 5 A -> T No14 A -> V Yes

Variant protein HUMTREFAC_PEA_(—)2_P8 (SEQ ID NO:1400) is encoded by thefollowing transcript(s): HUMTREFAC_PEA_(—)2_T4 (SEQ ID NO:131), forwhich the sequence(s) is/are given at the end of the application. Thecoding portion of transcript HUMTREFAC_PEA_(—)2_T4 (SEQ ID NO:131) isshown in bold; this coding portion starts at position 278 and ends atposition 400. The transcript also has the following SNPs as listed inTable 1083 (given according to their position on the nucleotidesequence, with the alternative nucleic acid listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein HUMTREFAC_PEA_(—)2_P8 (SEQ ID NO:1400) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 1083 Nucleic acid SNPs SNP position on nucleotide sequenceAlternative nucleic acid Previously known SNP? 233 A -> G Yes 290 G -> ANo 290 G -> T No 318 C -> T Yes 515 C -> A No 515 C -> T No 566 C -> TYes 756 C -> A Yes 796 C -> T No 1265 A -> C No 1266 A -> T No

As noted above, cluster HUMTREFAC features 7 segment(s), which werelisted in Table 2 above and for which the sequence(s) are given at theend of the application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster HUMTREFAC_PEA_(—)2_node_(—)0 (SEQ ID NO:903) accordingto the present invention is supported by 188 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMTREFAC_PEA_(—)2_T4 (SEQ IDNO:131) and HUMTREFAC_PEA_(—)2_T5 (SEQ ID NO:132). Table 1084 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1084 Segment location on transcripts Segment Segment startingending Transcript name position position HUMTREFAC_PEA_2_T4 (SEQ ID NO:131) 1 359 HUMTREFAC_PEA_2_T5 (SEQ ID NO: 132) 1 359

Segment cluster HUMTREFAC_PEA_(—)2_node_(—)9 (SEQ ID NO:904) accordingto the present invention is supported by 150 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMTREFAC_PEA_(—)2_T4 (SEQ IDNO:131) and HUMTREFAC_PEA_(—)2_T5 (SEQ ID NO:132). Table 1085 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1085 Segment location on transcripts Segment Segment startingending Transcript name position position HUMTREFAC_PEA_2_T4 (SEQ ID NO:131) 681 1266 HUMTREFAC_PEA_2_T5 (SEQ ID NO: 132) 570 747

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster HUMTREFAC_PEA_(—)2_node_(—)2 (SEQ ID NO:905) accordingto the present invention is supported by 4 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMTREFAC_PEA_(—)2_T4 (SEQ IDNO:131). Table 1086 below describes the starting and ending position ofthis segment on each transcript.

TABLE 1086 Segment location on transcripts Segment Segment startingending Transcript name position position HUMTREFAC_PEA_2_T4 (SEQ ID NO:131) 360 470

Segment cluster HUMTREFAC_PEA_(—)2_node_(—)3 (SEQ ID NO:906) accordingto the present invention is supported by 10 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMTREFAC_PEA_(—)2_T4 (SEQ IDNO:131) and HUMTREFAC_PEA_(—)2_T5 (SEQ ID NO:132). Table 1087 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1087 Segment location on transcripts Segment Segment startingending Transcript name position position HUMTREFAC_PEA_2_T4 (SEQ ID NO:131) 471 514 HUMTREFAC_PEA_2_T5 (SEQ ID NO: 132) 360 403

Segment cluster HUMTREFAC_PEA_(—)2_node_(—)4 (SEQ ID NO:907) accordingto the present invention is supported by 197 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMTREFAC_PEA_(—)2_T4 (SEQ IDNO:131) and HUMTREFAC_PEA_(—)2_T5 (SEQ ID NO:132). Table 1088 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1088 Segment location on transcripts Segment Segment startingending Transcript name position position HUMTREFAC_PEA_2_T4 (SEQ ID NO:131) 515 611 HUMTREFAC_PEA_2_T5 (SEQ ID NO: 132) 404 500

Segment cluster HUMTREFAC_PEA_(—)2_node_(—)5 (SEQ ID NO:908) accordingto the present invention is supported by 187 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HUMTREFAC_PEA_(—)2_T4 (SEQ IDNO:131) and HUMTREFAC_PEA_(—)2_T5 (SEQ ID NO:132). Table 1089 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1089 Segment location on transcripts Segment Segment startingending Transcript name position position HUMTREFAC_PEA_2_T4 (SEQ ID NO:131) 612 661 HUMTREFAC_PEA_2_T5 (SEQ ID NO: 132) 501 550

Segment cluster HUMTREFAC_PEA_(—)2_node_(—)8 (SEQ ID NO:909) accordingto the present invention can be found in the following transcript(s):HUMTREFAC_PEA_(—)2_T4 (SEQ ID NO:131) and HUMTREFAC_PEA_(—)2_T5 (SEQ IDNO:132). Table 1090 below describes the starting and ending position ofthis segment on each transcript.

TABLE 1090 Segment location on transcripts Segment Segment startingending Transcript name position position HUMTREFAC_PEA_2_T4 (SEQ ID NO:131) 662 680 HUMTREFAC_PEA_2_T5 (SEQ ID NO: 132) 551 569Variant protein alignment to the previously known protein:

Description for Cluster HSS100PCB

Cluster HSS100PCB features 1 transcript(s) and 3 segment(s) of interest,the names for which are given in Tables 1091 and 1092, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 1093.

TABLE 1091 Transcripts of interest Transcript Name Sequence ID No.HSS100PCB_T1 133

TABLE 1092 Segments of interest Segment Name Sequence ID No.HSS100PCB_node_3 910 HSS100PCB_node_4 911 HSS100PCB_node_5 912

TABLE 1093 Proteins of interest Sequence Protein Name ID No.Corresponding Transcript(s) HSS100PCB_P3 1401 HSS100PCB_T1 (SEQ ID NO:133)

These sequences are variants of the known protein S-100P protein(SwissProt accession identifier S10P_HUMAN), SEQ ID NO:1457, referred toherein as the previously known protein, which binds two calcium ions.

The sequence for protein S-100P protein (SEQ ID NO:1457) is given at theend of the application, as “S-100P protein amino acid sequence”. Knownpolymorphisms for this sequence are as shown in Table 1094.

TABLE 1094 Amino acid mutations for Known Protein SNP position(s) onamino acid sequence Comment 32 E -> T 44 F -> E

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: calcium binding; proteinbinding, which are annotation(s) related to Molecular Function.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

Cluster HSS100PCB can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 42 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 42 and Table 1095. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions: amixture of malignant tumors from different tissues.

TABLE 1095 Normal tissue distribution Name of Tissue Number bladder 41colon 37 epithelial 38 general 22 kidney 0 liver 0 lung 18 breast 0 bonemarrow 0 ovary 0 pancreas 0 prostate 46 stomach 553 uterus 13

TABLE 1096 P values and ratios for expression in cancerous tissue Nameof Tissue P1 P2 SP1 R3 SP2 R4 bladder 3.3e−01 2.9e−01 2.9e−02 2.83.5e−02 2.8 colon 3.0e−01 1.9e−01 5.2e−01 1.2 2.4e−01 1.7 epithelial4.7e−02 1.6e−02 2.0e−01 1.2 6.1e−02 1.3 general 1.1e−03 6.8e−05 1.4e−021.5 4.9e−04 1.7 kidney 6.5e−01 7.2e−01 5.8e−01 1.7 7.0e−01 1.4 liver9.1e−01 4.9e−01 1 1.0 7.7e−02 2.1 lung 6.8e−01 7.3e−01 2.2e−02 2.91.3e−01 1.7 breast 2.8e−01 3.2e−01 4.7e−01 2.0 6.8e−01 1.5 bone marrow 16.7e−01 1 1.0 2.8e−01 2.8 ovary 2.6e−01 3.0e−01 4.7e−01 2.0 5.9e−01 1.7pancreas 3.3e−01 4.4e−01 7.6e−02 3.7 1.5e−01 2.8 prostate 9.1e−019.3e−01 5.8e−01 0.6 7.6e−01 0.5 stomach 3.7e−01 3.2e−01 1 0.1 1 0.3uterus 9.4e−01 7.0e−01 1 0.6 4.1e−01 1.1

As noted above, cluster HSS100PCB features 1 transcript(s), which werelisted in Table 1091 above. These transcript(s) encode for protein(s)which are variant(s) of protein S-100P protein (SEQ ID NO:1457). Adescription of each variant protein according to the present inventionis now provided.

Variant protein HSS100PCB_P3 (SEQ ID NO:1401) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) HSS100PCB_T1 (SEQ IDNO:133). The location of the variant protein was determined according toresults from a number of different software programs and analyses,including analyses from SignalP and other specialized programs. Thevariant protein is believed to be located as follows with regard to thecell: secreted. The protein localization is believed to be secretedbecause both signal-peptide prediction programs predict that thisprotein has a signal peptide, and neither trans-membrane regionprediction program predicts that this protein has a trans-membraneregion.

Variant protein HSS100PCB_P3 (SEQ ID NO:1401) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table1097, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein HSS100PCB_P3 (SEQ ID NO:1401) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 1097 Amino acid mutations SNP position(s) on amino acid sequenceAlternative amino acid(s) Previously known SNP? 1 M -> R Yes 11 M -> LYes 20 L -> F Yes

Variant protein HSS100PCB_P3 (SEQ ID NO:1401) is encoded by thefollowing transcript(s): HSS100PCB_T1 (SEQ ID NO:133), for which thesequence(s) is/are given at the end of the application. The codingportion of transcript HSS100PCB_T1 (SEQ ID NO:133) is shown in bold;this coding portion starts at position 1057 and ends at position 1533.The transcript also has the following SNPs as listed in Table 1098(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinHSS100PCB_P3 (SEQ ID NO:1401) sequence provides support for the deducedsequence of this variant protein according to the present invention).

TABLE 1098 Nucleic acid SNPs SNP position on nucleotide sequenceAlternative nucleic acid Previously known SNP? 52 C -> T Yes 107 A -> CYes 458 C -> T Yes 468 A -> G Yes 648 C -> T Yes 846 C -> G Yes 882 G ->A Yes 960 C -> T No 965 C -> T Yes 1058 T -> G Yes 1087 A -> C Yes 1114C -> T Yes 1968 G -> A Yes 1971 C -> T Yes 2010 C -> A Yes 2099 G -> No

As noted above, cluster HSS100PCB features 3 segment(s), which werelisted in Table 1092 above and for which the sequence(s) are given atthe end of the application. These segment(s) are portions of nucleicacid sequence(s) which are described herein separately because they areof particular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster HSS100PCB_node_(—)3 (SEQ ID NO:910) according to thepresent invention is supported by 16 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HSS100PCB_T1 (SEQ ID NO:133). Table 1099 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1099 Segment location on transcripts Segment Segment Transcriptname starting position ending position HSS100PCB_T1 (SEQ ID NO: 133) 11133

Segment cluster HSS100PCB_node_(—)4 (SEQ ID NO:911) according to thepresent invention is supported by 29 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HSS100PCB_T1 (SEQ ID NO:133). Table 1100 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1100 Segment location on transcripts Segment Segment Transcriptname starting position ending position HSS100PCB_T1 (SEQ ID NO: 133)1134 1923

Segment cluster HSS100PCB_node_(—)5 (SEQ ID NO:912) according to thepresent invention is supported by 141 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): HSS100PCB_T1 (SEQ ID NO:133). Table 1101 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1101 Segment location on transcripts Segment Segment Transcriptname starting position ending position HSS100PCB_T1 (SEQ ID NO: 133)1924 2201

Description for Cluster HSU33147

Cluster HSU33147 features 2 transcript(s) and 5 segment(s) of interest,the names for which are given in Tables 1102 and 1103, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 1104.

TABLE 1102 Transcripts of interest Transcript Name Sequence ID No.HSU33147_PEA_1_T1 1464 HSU33147_PEA_1_T2 1465

TABLE 1103 Segments of interest Segment Name Sequence ID No.HSU33147_PEA_1_node_0 1276 HSU33147_PEA_1_node_2 1277HSU33147_PEA_1_node_4 1278 HSU33147_PEA_1_node_7 1279HSU33147_PEA_1_node_3 1280

TABLE 1104 Proteins of interest Sequence Protein Name ID No.Corresponding Transcript(s) HSU33147_PEA_1_P5 1415 HSU33147_PEA_1_T1(SEQ ID NO: 1464); HSU33147_PEA_1_T2 (SEQ ID NO: 1465)

These sequences are variants of the known protein Mammaglobin Aprecursor (SwissProt accession identifier MGBA_HUMAN; known alsoaccording to the synonyms Mammaglobin 1; Secretoglobin family 2A member2), SEQ ID NO: 1416, referred to herein as the previously known protein.

The sequence for protein Mammaglobin A precursor (SEQ ID NO:1416) isgiven at the end of the application, as “Mammaglobin A precursor aminoacid sequence”.

It has been investigated for clinical/therapeutic use in humans, forexample as a target for an antibody or small molecule, and/or as adirect therapeutic; available information related to theseinvestigations is as follows. Potential pharmaceutically related ortherapeutically related activity or activities of the previously knownprotein are as follows: Immunostimulant. A therapeutic role for aprotein represented by the cluster has been predicted. The cluster wasassigned this field because there was information in the drug databaseor the public databases (e.g., described herein above) that thisprotein, or part thereof, is used or can be used for a potentialtherapeutic indication: Anticancer.

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: steroid binding, which areannotation(s) related to Molecular Function.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

Cluster HSU33147 can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the left hand columnof the table and the numbers on the y-axis of FIG. 43 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 43 and Table 1105. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions: amixture of malignant tumors from different tissues.

TABLE 1105 Normal tissue distribution Name of Tissue Number epithelial 6general 2 lung 0 breast 131

TABLE 1106 P values and ratios for expression in cancerous tissue Nameof Tissue P1 P2 SP1 R3 SP2 R4 epithelial 4.1e−02 6.4e−02 1.5e−12 2.62.2e−06 1.5 general 1.6e−02 1.1e−02 1.2e−22 4.4 7.2e−13 2.4 lung 16.3e−01 1 1.0 6.2e−01 1.6 breast 8.6e−02 1.1e−01 3.4e−07 1.7 2.6e−03 1.0

As noted above, cluster HSU33147 features 2 transcript(s), which werelisted in Table 1102 above. These transcript(s) encode for protein(s)which are variant(s) of protein Mammaglobin A precursor (SEQ IDNO:1416). A description of each variant protein according to the presentinvention is now provided.

Variant protein HSU33147_PEA_(—)1_P5 (SEQ ID NO:1415) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) HSU33147_PEA_(—)1_T1 (SEQ IDNO:1464). An alignment is given to the known protein (Mammaglobin Aprecursor (SEQ ID NO:1416)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between HSU33147_PEA_(—)1_P5 (SEQ ID NO:1415) andMGBA_HUMAN (SEQ ID NO:1416):

1. An isolated chimeric polypeptide encoding for HSU33147_PEA_(—)1_P5(SEQ ID NO:1415), comprising a first amino acid sequence being at least90% homologous toMKLLMVLMLAALSQHCYAGSGCPLLENVISKTINPQVSKTEYKELLQEFIDDNATTNAIDELKECFLNQTDETLSNVE corresponding to amino acids 1-78 of MGBA_HUMAN (SEQ IDNO:1416), which also corresponds to amino acids 1-78 ofHSU33147_PEA_(—)1_P5 (SEQ ID NO:1415), and a second amino acid sequencebeing at least 90% homologous to QLIYDSSLCDLF corresponding to aminoacids 82-93 of MGBA_HUMAN (SEQ ID NO:1416), which also corresponds toamino acids 79-90 of HSU33147_PEA_(—)1_P5 (SEQ ID NO:1415), wherein saidfirst amino acid sequence and second amino acid sequence are contiguousand in a sequential order.

2. An isolated chimeric polypeptide encoding for an edge portion ofHSU33147_PEA_(—)1_P5 (SEQ ID NO:1415), comprising a polypeptide having alength “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise EQ, having a structureas follows: a sequence starting from any of amino acid numbers 78−x to78; and ending at any of amino acid numbers 79+((n−2)−x), in which xvaries from 0 to n−2.

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

The glycosylation sites of variant protein HSU33147_PEA_(—)1_P5 (SEQ IDNO:1415), as compared to the known protein Mammaglobin A precursor (SEQID NO:1416), are described in Table 1107 (given according to theirposition(s) on the amino acid sequence in the first column; the secondcolumn indicates whether the glycosylation site is present in thevariant protein; and the last column indicates whether the position isdifferent on the variant protein).

TABLE 1107 Glycosylation site(s) Position(s) on known amino Position inacid sequence Present in variant protein? variant protein? 68 yes 68 53yes 53

Variant protein HSU33147_PEA_(—)1_P5 (SEQ ID NO:1415) is encoded by thefollowing transcript(s): HSU33147_PEA_(—)1_T1 (SEQ ID NO:1464), forwhich the sequence(s) is/are given at the end of the application. Thecoding portion of transcript HSU33147_PEA_(—)1_T1 (SEQ ID NO:1464) isshown in bold; this coding portion starts at position 72 and ends atposition 341. The transcript also has the following SNPs as listed inTable 1108 (given according to their position on the nucleotidesequence, with the alternative nucleic acid listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein HSU33147_PEA_(—)1_P5 (SEQ ID NO:1415) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 1108 Nucleic acid SNPs SNP position on nucleotide sequenceAlternative nucleic acid Previously known SNP? 84 A -> C No 124 C -> No396 A -> G No

As noted above, cluster HSU33147 features 5 segment(s), which werelisted in Table 1103 above and for which the sequence(s) are given atthe end of the application. These segment(s) are portions of nucleicacid sequence(s) which are described herein separately because they areof particular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster HSU33147_PEA_(—)1_node_(—)0 (SEQ ID NO:1276) accordingto the present invention is supported by 38 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HSU33147_PEA_(—)1_T1 (SEQ IDNO:1464) and HSU33147_PEA_(—)1_T2 (SEQ ID NO:1465). Table 1109 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1109 Segment location on transcripts Segment Segment startingending Transcript name position position HSU33147_PEA_1_T1 (SEQ ID NO:1464) 1 126 HSU33147_PEA_1_T2 (SEQ ID NO: 1465) 1 126

Segment cluster HSU33147_PEA_(—)1_node_(—)2 (SEQ ID NO:1277) accordingto the present invention is supported by 44 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HSU33147_PEA_(—)1_T1 (SEQ IDNO:1464) and HSU33147_PEA_(—)1_T2 (SEQ ID NO:1465). Table 1110 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1110 Segment location on transcripts Segment Segment startingending Transcript name position position HSU33147_PEA_1_T1 (SEQ ID NO:1464) 127 305 HSU33147_PEA_1_T2 (SEQ ID NO: 1465) 127 305

Segment cluster HSU33147_PEA_(—)1_node_(—)4 (SEQ ID NO:1278) accordingto the present invention is supported by 3 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HSU33147_PEA_(—)1_T2 (SEQ IDNO:1465). Table 1111 below describes the starting and ending position ofthis segment on each transcript.

TABLE 1111 Segment location on transcripts Segment Segment startingending Transcript name position position HSU33147_PEA_1_T2 (SEQ ID NO:1465) 315 907

Segment cluster HSU33147_PEA_(—)1_node_(—)7 (SEQ ID NO:1279) accordingto the present invention is supported by 35 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): HSU33147_PEA_(—)1_T1 (SEQ IDNO:1464). Table 1112 below describes the starting and ending position ofthis segment on each transcript.

TABLE 1112 Segment location on transcripts Segment Segment startingending Transcript name position position HSU33147_PEA_1_T1 (SEQ ID NO:1464) 306 516

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster HSU33147_PEA_(—)1_node_(—)3 (SEQ ID NO:1280) accordingto the present invention can be found in the following transcript(s):HSU33147_PEA_(—)1_T2 (SEQ ID NO:1465). Table 1113 below describes thestarting and ending position of this segment on each transcript.

TABLE 1113 Segment location on transcripts Segment Segment startingending Transcript name position position HSU33147_PEA_1_T2 (SEQ ID NO:1465) 306 314Variant protein alignment to the previously known protein:

Description for Cluster R20779

Cluster R20779 features 1 transcript(s) and 24 segment(s) of interest,the names for which are given in Tables 1114 and 1115, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 1116.

TABLE 1114 Transcripts of interest Transcript Name Sequence ID No.R20779_T7 134

TABLE 1115 Segments of interest Segment Name Sequence ID No.R20779_node_0 913 R20779_node_2 914 R20779_node_7 915 R20779_node_9 916R20779_node_18 917 R20779_node_21 918 R20779_node_24 919 R20779_node_27920 R20779_node_28 921 R20779_node_30 922 R20779_node_31 923R20779_node_32 924 R20779_node_1 925 R20779_node_3 926 R20779_node_10927 R20779_node_11 928 R20779_node_14 929 R20779_node_17 930R20779_node_19 931 R20779_node_20 932 R20779_node_22 933 R20779_node_23934 R20779_node_25 935 R20779_node_29 936

TABLE 1116 Proteins of interest Protein Name Sequence ID No.Corresponding Transcript(s) R20779_P2 1402 R20779_T7 (SEQ ID NO: 134)

These sequences are variants of the known protein Stanniocalcin 2precursor (SwissProt accession identifier STC2_HUMAN; known alsoaccording to the synonyms STC-2; Stanniocalcin-related protein; STCRP;STC-related protein), SEQ ID NO:1458, referred to herein as thepreviously known protein.

Protein Stanniocalcin 2 precursor (SEQ ID NO:1458) is known or believedto have the following function(s): Has an anti-hypocalcemic action oncalcium and phosphate homeostasis. The sequence for proteinStanniocalcin 2 precursor is given at the end of the application, as“Stanniocalcin 2 precursor amino acid sequence”. Protein Stanniocalcin 2precursor localization is believed to be Secreted (Potential).

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: cell surface receptor linkedsignal transduction; cell-cell signaling; nutritional response pathway,which are annotation(s) related to Biological Process; hormone, whichare annotation(s) related to Molecular Function; and extracellular,which are annotation(s) related to Cellular Component.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

Cluster 820779 can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 44 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 44 and Table 1117. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions:epithelial malignant tumors, a mixture of malignant tumors fromdifferent tissues and lung malignant tumors.

TABLE 1117 Normal tissue distribution Name of Tissue Number bone 825brain 0 colon 0 epithelial 32 general 38 kidney 22 liver 9 lung 11 lymphnodes 0 breast 215 muscle 35 ovary 36 pancreas 4 prostate 80 skin 99stomach 0 uterus 4

TABLE 1118 P values and ratios for expression in cancerous tissue Nameof Tissue P1 P2 SP1 R3 SP2 R4 bone 5.9e−01 7.4e−01 1 0.2 1 0.1 brain2.5e−02 1.6e−02 2.2e−01 6.0 3.5e−02 8.0 colon 1.7e−01 1.7e−01 1 1.37.7e−01 1.5 epithelial 1.7e−01 1.5e−03 5.9e−01 1.0 2.0e−04 2.0 general2.4e−02 6.2e−07 7.6e−01 0.8 4.6e−05 1.6 kidney 4.3e−01 2.7e−01 6.2e−011.3 1.5e−01 2.0 liver 8.3e−01 7.6e−01 1 0.8 3.3e−01 1.6 lung 1.2e−011.4e−03 1.9e−01 2.9 1.6e−05 7.7 lymph nodes 1 3.1e−01 1 1.0 1 1.4 breast6.8e−01 6.8e−01 6.9e−01 0.8 3.6e−01 0.8 muscle 9.2e−01 4.8e−01 1 0.31.4e−03 1.4 ovary 8.4e−01 7.1e−01 9.0e−01 0.7 8.6e−01 0.8 pancreas9.3e−01 6.8e−01 1 0.7 1.5e−01 2.0 prostate 9.1e−01 5.0e−01 9.8e−01 0.45.7e−01 0.7 skin 6.3e−01 7.5e−01 7.1e−01 0.8 9.5e−01 0.3 stomach 14.5e−01 1 1.0 5.1e−01 1.8 uterus 7.1e−01 2.6e−01 4.4e−01 1.7 4.1e−01 1.8

As noted above, cluster R20779 features 1 transcript(s), which werelisted in Table 1114 above. These transcript(s) encode for protein(s)which are variant(s) of protein Stanniocalcin 2 precursor (SEQ IDNO:1458). A description of each variant protein according to the presentinvention is now provided.

Variant protein R20779_P2 (SEQ ID NO:1402) according to the presentinvention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) R20779_T7 (SEQ ID NO:134).An alignment is given to the known protein (Stanniocalcin 2 precursor(SEQ ID NO:1458)) at the end of the application. One or more alignmentsto one or more previously published protein sequences are given at theend of the application. A brief description of the relationship of thevariant protein according to the present invention to each such alignedprotein is as follows:

Comparison report between R20779_P2 (SEQ ID NO:1402) and STC2_HUMAN (SEQID NO:1458):

1. An isolated chimeric polypeptide encoding for R20779_P2 (SEQ IDNO:1402), comprising a first amino acid sequence being at least 90%homologous toMCAERLGQFMTLALVLATFDPARGTDATNPPEGPQDRSSQQKGRLSLQNTAEIQHCLVNAGDVGCGVFECFENNSCEIRGLHGICMTFLHNAGKFDAQGKSFIKDALKCKAHALRHRFGCISRKCPAIREMVSQLQRECYLKHDLCAAAQENTRVIVEMIHFKDLLLHE corresponding to amino acids 1-169 ofSTC2_HUMAN (SEQ ID NO:1458), which also corresponds to amino acids 1-169of R20779_P2 (SEQ ID NO:1402), and a second amino acid sequence being atleast 70%, optionally at least 80%, preferably at least 85%, morepreferably at least 90% and most preferably at least 95% homologous to apolypeptide having the sequence CYKIEITMPKRRKVKLRD (SEQ ID NO: 270)corresponding to amino acids 170-187 of R20779_P2 (SEQ ID NO:1402),wherein said first amino acid sequence and second amino acid sequenceare contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of R20779_P2 (SEQ IDNO:1402), comprising a polypeptide being at least 70%, optionally atleast about 80%, preferably at least about 85%, more preferably at leastabout 90% and most preferably at least about 95% homologous to thesequence CYKIEITMPKRRKVKLRD (SEQ ID NO: 270) in R20779_P2 (SEQ IDNO:1402).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein R20779_P2 (SEQ ID NO:1402) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table1119, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein R20779_P2 (SEQ ID NO:1402) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 1119 Amino acid mutations SNP position(s) on amino acid sequenceAlternative amino acid(s) Previously known SNP? 16 L -> No 98 Q -> No171 Y -> C Yes 177 M -> V Yes

The glycosylation sites of variant protein R20779_P2 (SEQ ID NO:1402),as compared to the known protein Stanniocalcin 2 precursor (SEQ IDNO:1458), are described in Table 1120 (given according to theirposition(s) on the amino acid sequence in the first column; the secondcolumn indicates whether the glycosylation site is present in thevariant protein; and the last column indicates whether the position isdifferent on the variant protein).

TABLE 1120 Glycosylation site(s) Position(s) on known amino Position inacid sequence Present in variant protein? variant protein? 73 yes 73

Variant protein R20779_P2 (SEQ ID NO:1402) is encoded by the followingtranscript(s): R20779_T7 (SEQ ID NO:134), for which the sequence(s)is/are given at the end of the application. The coding portion oftranscript R20779_T7 (SEQ ID NO:134) is shown in bold; this codingportion starts at position 1397 and ends at position 1957. Thetranscript also has the following SNPs as listed in Table 1121 (givenaccording to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinR20779_P2 (SEQ ID NO:1402) sequence provides support for the deducedsequence of this variant protein according to the present invention).

TABLE 1121 Nucleic acid SNPs SNP position on nucleotide sequenceAlternative nucleic acid Previously known SNP? 1442 T -> No 1690 G -> No1732 C -> T Yes 1867 G -> T Yes 1908 A -> G Yes 1925 A -> G Yes 1968 G-> A Yes 2087 C -> T No 2138 C -> T Yes 2270 C -> No 2443 A -> No 2478 G-> No 2479 C -> A No 2616 C -> A No 2941 C -> No 3196 -> A No 3479 T ->G Yes 4290 C -> T Yes 4358 G -> A Yes 5363 G -> A No

As noted above, cluster R20779 features 24 segment(s), which were listedin Table 1115 above and for which the sequence(s) are given at the endof the application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster R20779_node_(—)0 (SEQ ID NO:913) according to thepresent invention is supported by 31 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R20779_T7 (SEQ ID NO:134). Table 1122 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1122 Segment location on transcripts Segment Segment Transcriptname starting position ending position R20779_T7 (SEQ ID NO: 134) 1 1298

Segment cluster R20779_node_(—)2 (SEQ ID NO:914) according to thepresent invention is supported by 55 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R20779_T7 (SEQ ID NO:134). Table 1123 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1123 Segment location on transcripts Segment Segment Transcriptname starting position ending position R20779_T7 (SEQ ID NO: 134) 13371506

Segment cluster R20779_node_(—)7 (SEQ ID NO:915) according to thepresent invention is supported by 63 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R20779_T7 (SEQ ID NO:134). Table 1124 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1124 Segment location on transcripts Segment Segment Transcriptname starting position ending position R20779_T7 (SEQ ID NO: 134) 15481690

Segment cluster R20779_node_(—)9 (SEQ ID NO:916) according to thepresent invention is supported by 66 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R20779_T7 (SEQ ID NO:134). Table 1125 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1125 Segment location on transcripts Segment Segment Transcriptname starting position ending position R20779_T7 (SEQ ID NO: 134) 16911838

Segment cluster R20779_node_(—)18 (SEQ ID NO:917) according to thepresent invention is supported by 61 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R20779_T7 (SEQ ID NO:134). Table 1126 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1126 Segment location on transcripts Segment Segment Transcriptname starting position ending position R20779_T7 (SEQ ID NO: 134) 20092176

Segment cluster R20779_node_(—)21 (SEQ ID NO:918) according to thepresent invention is supported by 106 libraries. The number of librarieswas determined as'previously described. This segment can be found in thefollowing transcript(s): R20779_T7 (SEQ ID NO:134). Table 1127 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1127 Segment location on transcripts Segment Segment Transcriptname starting position ending position R20779_T7 (SEQ ID NO: 134) 22192796

Segment cluster R20779_node_(—)24 (SEQ ID NO:919) according to thepresent invention is supported by 100 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R20779_T7 (SEQ ID NO:134). Table 1128 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1128 Segment location on transcripts Segment Segment Transcriptname starting position ending position R20779_T7 (SEQ ID NO: 134) 29773667

Segment cluster R20779_node_(—)27 (SEQ ID NO:920) according to thepresent invention is supported by 26 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R20779_T7 (SEQ ID NO:134). Table 1129 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1129 Segment location on transcripts Segment Segment Transcriptname starting position ending position R20779_T7 (SEQ ID NO: 134) 36733803

Segment cluster R20779_node_(—)28 (SEQ ID NO:921) according to thepresent invention is supported by 31 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R20779_T7 (SEQ ID NO:134). Table 1130 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1130 Segment location on transcripts Segment Segment Transcriptname starting position ending position R20779_T7 (SEQ ID NO: 134) 38044050

Segment cluster R20779_node_(—)30 (SEQ ID NO:922) according to thepresent invention is supported by 34 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R20779_T7 (SEQ ID NO:134). Table 1131 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1131 Segment location on transcripts Segment Segment Transcriptname starting position ending position R20779_T7 (SEQ ID NO: 134) 40684193

Segment cluster R20779_node_(—)31 (SEQ ID NO:923) according to thepresent invention is supported by 46 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R20779_T7 (SEQ ID NO:134). Table 1132 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1132 Segment location on transcripts Segment Segment Transcriptname starting position ending position R20779_T7 (SEQ ID NO: 134) 41944424

Segment cluster R20779_node_(—)32 (SEQ ID NO:924) according to thepresent invention is supported by 88 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R20779_T7 (SEQ ID NO:134). Table 1133 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1133 Segment location on transcripts Segment Segment Transcriptname starting position ending position R20779_T7 (SEQ ID NO: 134) 44255503

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster R20779_node_(—)1 (SEQ ID NO:925) according to thepresent invention is supported by 27 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R20779_T7 (SEQ ID NO:134). Table 1134 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1134 Segment location on transcripts Segment Segment Transcriptname starting position ending position R20779_T7 (SEQ ID NO: 134) 12991336

Segment cluster R20779_node_(—)3 (SEQ ID NO:926) according to thepresent invention is supported by 52 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R20779_T7 (SEQ ID NO:134). Table 1135 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1135 Segment location on transcripts Segment Segment Transcriptname starting position ending position R20779_T7 (SEQ ID NO: 134) 15071547

Segment cluster R20779_node_(—)1 (SEQ ID NO:927) according to thepresent invention can be found in the following transcript(s): R20779_T7(SEQ ID NO:134). Table 1136 below describes the starting and endingposition of this segment on each transcript.

TABLE 1136 Segment location on transcripts Segment Segment Transcriptname starting position ending position R20779_T7 (SEQ ID NO: 134) 18391849

Segment cluster R20779_node_(—)11 (SEQ ID NO:928) according to thepresent invention is supported by 58 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R20779_T7 (SEQ ID NO:134). Table 1137 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1137 Segment location on transcripts Segment Segment Transcriptname starting position ending position R20779_T7 (SEQ ID NO: 134) 18501902

Segment cluster R20779_node_(—)14 (SEQ ID NO:929) according to thepresent invention is supported by 1 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R20779_T7 (SEQ ID NO:134). Table 1138 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1138 Segment location on transcripts Segment Segment Transcriptname starting position ending position R20779_T7 (SEQ ID NO: 134) 19031975

Segment cluster R20779_node_(—)17 (SEQ ID NO:930) according to thepresent invention is supported by 54 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R20779_T7 (SEQ ID NO:134). Table 1139 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1139 Segment location on transcripts Segment Segment Transcriptname starting position ending position R20779_T7 (SEQ ID NO: 134) 19762008

Segment cluster R20779_node_(—)19 (SEQ ID NO:931) according to thepresent invention can be found in the following transcript(s): R20779_T7(SEQ ID NO:134). Table 1140 below describes the starting and endingposition of this segment on each transcript.

TABLE 1140 Segment location on transcripts Segment Segment Transcriptname starting position ending position R20779_T7 (SEQ ID NO: 134) 21772188

Segment cluster R20779_node_(—)20 (SEQ ID NO:932) according to thepresent invention is supported by 53 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R20779_T7 (SEQ ID NO:134). Table 1141 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1141 Segment location on transcripts Segment Segment Transcriptname starting position ending position R20779_T7 (SEQ ID NO: 134) 21892218

Segment cluster R20779_node_(—)22 (SEQ ID NO:933) according to thepresent invention is supported by 76 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R20779_T7 (SEQ ID NO:134). Table 1142 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1142 Segment location on transcripts Segment Segment Transcriptname starting position ending position R20779_T7 (SEQ ID NO: 134) 27972899

Segment cluster R20779_node_(—)23 (SEQ ID NO:934) according to thepresent invention is supported by 81 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R20779_T7 (SEQ ID NO:134). Table 1143 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1143 Segment location on transcripts Segment Segment Transcriptname starting position ending position R20779_T7 (SEQ ID NO: 134) 29002976

Segment cluster R20779_node_(—)25 (SEQ ID NO:935) according to thepresent invention can be found in the following transcript(s): R20779_T7(SEQ ID NO:134). Table 1144 below describes the starting and endingposition of this segment on each transcript.

TABLE 1144 Segment location on transcripts Segment Segment Transcriptname starting position ending position R20779_T7 (SEQ ID NO: 134) 36683672

Segment cluster R20779_node_(—)29 (SEQ ID NO:936) according to thepresent invention can be found in the following transcript(s): R20779_T7(SEQ ID NO:134). Table 1145 below describes the starting and endingposition of this segment on each transcript.

TABLE 1145 Segment location on transcripts Segment Segment Transcriptname starting position ending position R20779_T7 (SEQ ID NO: 134) 40514067Variant protein alignment to the previously known protein:

Description for Cluster R38144

Cluster R38144 features 6 transcript(s) and 24 segment(s) of interest,the names for which are given in Tables 1146 and 1147, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 1148.

TABLE 1146 Transcripts of interest Transcript Name Sequence ID No.R38144_PEA_2_T6 135 R38144_PEA_2_T10 136 R38144_PEA_2_T13 137R38144_PEA_2_T15 138 R38144_PEA_2_T19 139 R38144_PEA_2_T27 140

TABLE 1147 Segments of interest Segment Name Sequence ID No.R38144_PEA_2_node_21 937 R38144_PEA_2_node_26 938 R38144_PEA_2_node_29939 R38144_PEA_2_node_31 940 R38144_PEA_2_node_46 941R38144_PEA_2_node_47 942 R38144_PEA_2_node_49 943 R38144_PEA_2_node_0944 R38144_PEA_2_node_1 945 R38144_PEA_2_node_4 946 R38144_PEA_2_node_5947 R38144_PEA_2_node_7 948 R38144_PEA_2_node_11 949R38144_PEA_2_node_14 950 R38144_PEA_2_node_15 951 R38144_PEA_2_node_16952 R38144_PEA_2_node_19 953 R38144_PEA_2_node_20 954R38144_PEA_2_node_36 955 R38144_PEA_2_node_37 956 R38144_PEA_2_node_43957 R38144_PEA_2_node_44 958 R38144_PEA_2_node_45 959R38144_PEA_2_node_51 960

TABLE 1148 Proteins of interest Protein Name Sequence ID No.Corresponding Transcript(s) R38144_PEA_2_P6 1403 R38144_PEA_2_T6 (SEQ IDNO: 135) R38144_PEA_2_P13 1404 R38144_PEA_2_T13 (SEQ ID NO: 137)R38144_PEA_2_P15 1405 R38144_PEA_2_T15 (SEQ ID NO: 138) R38144_PEA_2_P191406 R38144_PEA_2_T19 (SEQ ID NO: 139) R38144_PEA_2_P24 1407R38144_PEA_2_T27 (SEQ ID NO: 140) R38144_PEA_2_P36 1408 R38144_PEA_2_T10(SEQ ID NO: 136)

These sequences are variants of the known protein Putativealpha-mannosidase C20orf31 precursor (SwissProt accession identifierCT31_HUMAN; known also according to the synonyms EC 3.2.1), SEQ IDNO:1459, referred to herein as the previously known protein.

The sequence for protein Putative alpha-mannosidase C20orf31 precursor(SEQ ID NO:1459) is given at the end of the application, as “Putativealpha-mannosidase C20orf31 precursor amino acid sequence”. Knownpolymorphisms for this sequence are as shown in Table 1149.

TABLE 1149 Amino acid mutations for Known Protein SNP position(s) onamino acid sequence Comment 456 A -> T. /FTId = VAR_012165. 511 S -> C

Protein Putative alpha-mannosidase C20orf31 precursor (SEQ ID NO:1459)localization is believed to be Secreted (Potential).

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: carbohydrate metabolism;N-linked glycosylation, which are annotation(s) related to BiologicalProcess; mannosyl-oligosaccharide 1,2-alpha-mannosidase; calciumbinding; hydrolase, acting on glycosyl bonds, which are annotation(s)related to Molecular Function; and membrane, which are annotation(s)related to Cellular Component.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

Cluster 838144 can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 45 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 45 and Table 1150. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions:epithelial malignant tumors, lung malignant tumors, skin malignanciesand gastric carcinoma.

TABLE 1150 Normal tissue distribution Name of Tissue Number Adrenal 40Bladder 41 Bone 38 Brain 16 Colon 37 Epithelial 18 General 31 head andneck 50 Kidney 26 Liver 4 Lung 11 lymph nodes 47 Breast 52 Ovary 7Pancreas 20 Prostate 0 Skin 13 Stomach 0 Uterus 0

TABLE 1151 P values and ratios for expression in cancerous tissue Nameof Tissue P1 P2 SP1 R3 SP2 R4 Adrenal 9.2e−01 6.9e−01 1 0.5 7.8e−01 0.9Bladder 7.6e−01 8.1e−01 8.1e−01 0.9 9.0e−01 0.7 Bone 6.6e−01 8.5e−01 10.6 1 0.6 Brain 8.0e−02 6.0e−02 4.7e−02 3.0 1.6e−02 3.0 colon 7.7e−017.5e−01 1 0.5 3.5e−01 0.8 epithelial 2.0e−01 4.8e−03 1.7e−01 1.4 2.7e−165.2 general 3.9e−01 2.2e−02 7.8e−01 0.9 2.1e−19 2.9 head and neck3.4e−01 5.6e−01 4.6e−01 1.4 7.5e−01 0.9 kidney 8.3e−01 7.7e−01 4.4e−011.4 8.5e−02 1.6 liver 9.1e−01 6.0e−01 1 0.9 1.1e−01 1.8 lung 1.6e−021.5e−02 9.5e−02 3.8 1.6e−05 6.6 lymph nodes 7.1e−01 7.8e−01 1 0.31.2e−04 1.0 breast 9.1e−01 9.1e−01 1 0.5 9.7e−01 0.6 ovary 5.0e−012.9e−01 4.7e−01 1.7 7.0e−02 2.2 pancreas 7.2e−01 4.2e−01 8.1e−01 0.83.0e−02 1.8 prostate 7.9e−01 5.7e−01 3.0e−01 2.5 1.8e−04 3.0 skin9.2e−01 8.7e−02 1 0.5 3.0e−05 4.1 stomach 3.0e−01 5.5e−02 2.5e−01 3.09.2e−04 6.1 uterus 2.1e−01 9.4e−02 4.4e−01 2.0 5.1e−01 1.9

As noted above, cluster R38144 features 6 transcript(s), which werelisted in Table 1146 above. These transcript(s) encode for protein(s)which are variant(s) of protein Putative alpha-mannosidase C20orf31precursor (SEQ ID NO:1459). A description of each variant proteinaccording to the present invention is now provided.

Variant protein R38144_PEA_(—)2_P6 (SEQ ID NO:1403) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) R38144_PEA_(—)2_T6 (SEQ IDNO:135). An alignment is given to the known protein (Putativealpha-mannosidase C20orf31 precursor (SEQ ID NO:1459)) at the end of theapplication. One or more alignments to one or more previously publishedprotein sequences are given at the end of the application. A briefdescription of the relationship of the variant protein according to thepresent invention to each such aligned protein is as follows:

Comparison report between R38144_PEA_(—)2_P6 (SEQ ID NO:1403) andCT31_HUMAN (SEQ ID NO:1459):

1. An isolated chimeric polypeptide encoding for R38144_PEA_(—)2_P6 (SEQID NO:1403), comprising a first amino acid sequence being at least 90%homologous toMPFRLLIPLGLLCALLPQHHGAPGPDGSAPDPAHYRERVKAMFYHAYDSYLENAFPFDELRPLPTCDGHDTWGSFSLTLIDALDTLLILGNVSEFQRVVEVLQDSVDFDIDVNASVFETNIRVVGGLLSAHLLSKKAGVEVEAGWPCSGPLLRMAEEAARKLLPAFQTPTGMPYGTVNLLHGVNPGETPVTCTAGIGTFIVEFATLSSLTGDPVFEDVARVALMRLWESRSDIGLVGNHIDVLTGKWVAQDAGIGAGVDSYFEYLVKGAILLQDKKLMAMFLEYNKAIRNYTRFDDWYLWVQMYKGTVSMPVFQSLEAYWPGLQSLIGDIDNAMRTFLNYYTVWKQFGGLPEFYNIPQGYTVEKREGYPLRPELIESAMYLYRATGDPTLLELGRDAVESIEKISKVECGFATcorresponding to amino acids 1-412 of CT31_HUMAN (SEQ ID NO:1459), whichalso corresponds to amino acids 1-412 of R38144_PEA_(—)2_P6 (SEQ IDNO:1403), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence LASFSHMSDQRSARPQAGQPHGVVLPGRDCEIPLPPV (SEQ ID NO:268) corresponding to amino acids 413-449 of R38144_PEA_(—)2_P6 (SEQ IDNO:1403), wherein said first amino acid sequence and second amino acidsequence are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of R38144_PEA_(—)2_P6(SEQ ID NO:1403), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence LASFSHMSDQRSARPQAGQPHGVVLPGRDCEIPLPPV (SEQ IDNO: 268) in R38144_PEA_(—)2_P6 (SEQ ID NO:1403).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein R38144_PEA_(—)2_P6 (SEQ ID NO:1403) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 1152, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein R38144_PEA_(—)2_P6 (SEQ ID NO:1403) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 1152 Amino acid mutations SNP position(s) on amino acidAlternative sequence amino acid(s) Previously known SNP? 10 G -> No 54 A-> V Yes 55 F -> L Yes 73 S -> I Yes 87 I -> No 145 P -> No 145 P -> ANo 164 A -> G No 164 A -> No 203 A -> G No 203 A -> No 211 D -> No 236 G-> No 265 V -> G No 285 K -> No 294 D -> N No 305 G -> E No 323 Q -> RNo 346 F -> No

The glycosylation sites of variant protein R38144_PEA_(—)2_P6 (SEQ IDNO:1403), as compared to the known protein Putative alpha-mannosidaseC20orf31 precursor (SEQ ID NO:1459), are described in Table 1153 (givenaccording to their position(s) on the amino acid sequence in the firstcolumn; the second column indicates whether the glycosylation site ispresent in the variant protein; and the last column indicates whetherthe position is different on the variant protein).

TABLE 1153 Glycosylation site(s) Position(s) on known amino acid Presentin Position in sequence variant protein? variant protein? 450 no 289 yes289 112 yes 112 90 yes 90

Variant protein R38144_PEA_(—)2_P6 (SEQ ID NO:1403) is encoded by thefollowing transcript(s): R38144_PEA_(—)2_T6 (SEQ ID NO:135), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript R38144_PEA_(—)2_T6 (SEQ ID NO:135) is shown inbold; this coding portion starts at position 91 and ends at position1437. The transcript also has the following SNPs as listed in Table 1154(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinR38144_PEA_(—)2_P6 (SEQ ID NO:1403) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 1154 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 120 C -> No 251 C -> T Yes253 T -> C Yes 308 G -> T Yes 312 T -> C No 350 T -> No 523 C -> No 523C -> G No 581 C -> No 581 C -> G No 698 C -> No 698 C -> G No 723 C ->No 798 C -> No 798 C -> G No 849 -> C No 849 -> G No 884 T -> G No 901-> C No 901 -> T No 943 A -> No 970 G -> A No 1004 G -> A No 1058 A -> GNo 1126 T -> No 1218 C -> T Yes 1392 A -> G No 1425 T -> C No 1481 G ->A Yes 1560 C -> T No 1566 C -> No 1644 G -> A Yes 1646 A -> T No 1763 A-> No 1763 A -> C No 1781 C -> T Yes 1799 C -> No 1799 C -> G No 1844 T-> G No 1855 A -> C Yes

Variant protein R38144_PEA_(—)2_P13 (SEQ ID NO:1404) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) R38144_PEA_(—)2_T13 (SEQ IDNO:137). An alignment is given to the known protein (Putativealpha-mannosidase C20orf31 precursor (SEQ ID NO:1459)) at the end of theapplication. One or more alignments to one or more previously publishedprotein sequences are given at the end of the application. A briefdescription of the relationship of the variant protein according to thepresent invention to each such aligned protein is as follows:

Comparison report between R38144_PEA_(—)2_P13 (SEQ ID NO:1404) andCT31_HUMAN (SEQ ID NO:1459):

1. An isolated chimeric polypeptide encoding for R38144_PEA_(—)2_P13(SEQ ID NO:1404), comprising a first amino acid sequence being at least90% homologous toMPFRLLIPLGLLCALLPQHHGAPGPDGSAPDPAHYRERVKAMFYHAYDSYLENAFPFDELRPLTCDGHDTWGSFSLTLIDALDTLLILGNVSEFQRVVEVLQDSVDFDIDVNASVFETNIRVVGGLLSAHLLSKKAGVEVEAGWPCSGPLLRMAEEAARKLLPAFQTPTGMPYGTVNLLHGVNPGETPVTCTAGIGTFIVEFATLSSLTGDPVFEDVARVALMRLWESRSDIGLVGNHIDVLTGKWVAQDAGIGAGVDSYFEYLVKGAILLQDKKLMAMFLEYNKAIRNYTRFDDWYLWVQMYKGTVSMPVFQSLEAYWPGLQ corresponding to amino acids1-323 of CT31_HUMAN (SEQ ID NO:1459), which also corresponds to aminoacids 1-323 of R38144_PEA_(—)2_P13 (SEQ ID NO:1404), and a second aminoacid sequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence NLLKAQCTSTVPRGIPPS (SEQID NO: 269) corresponding to amino acids 324-341 of R38144_PEA_(—)2_P13(SEQ ID NO:1404), wherein said first amino acid sequence and secondamino acid sequence are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of R38144PEA_(—)2_P13(SEQ ID NO:1404), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence NLLKAQCTSTVPRGIPPS (SEQ ID NO: 269) inR38144_PEA_(—)2_P13 (SEQ ID NO:1404).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignaiP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein R38144_PEA_(—)2_P13 (SEQ ID NO:1404) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 1155, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein R38144_PEA_(—)2_P13 (SEQ ID NO:1404) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 1155 Amino acid mutations SNP position(s) on amino acidAlternative sequence amino acid(s) Previously known SNP? 10 G -> No 54 A-> V Yes 55 F -> L Yes 73 S -> I Yes 87 I -> No 145 P -> No 145 P -> ANo 164 A -> G No 164 A -> No 203 A -> G No 203 A -> No 211 D -> No 236 G-> No 265 V -> G No 285 K -> No 294 D -> N No 305 G -> E No 323 Q -> RNo 328 A -> V Yes

The glycosylation sites of variant protein R38144_PEA_(—)2_P13 (SEQ IDNO:1404), as compared to the known protein Putative alpha-mannosidaseC20orf31 precursor (SEQ ID NO:1459), are described in Table 1156 (givenaccording to their position(s) on the amino acid sequence in the firstcolumn; the second column indicates whether the glycosylation site ispresent in the variant protein; and the last column indicates whetherthe position is different on the variant protein).

TABLE 1156 Glycosylation site(s) Position(s) on known amino acid Presentin Position sequence variant protein? in variant protein? 450 no 289 yes289 112 yes 112 90 yes 90

Variant protein R38144_PEA_(—)2_P13 (SEQ ID NO:1404) is encoded by thefollowing transcript(s): R38144_PEA_(—)2_T13 (SEQ ID NO:137), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript R38144_PEA_(—)2_T13 (SEQ ID NO:137) is shown inbold; this coding portion starts at position 91 and ends at position1113. The transcript also has the following SNPs as listed in Table 1157(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinR38144_PEA_(—)2_P13 (SEQ ID NO:1404) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 1157 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 120 C -> No 251 C -> T Yes253 T -> C Yes 308 G -> T Yes 312 T -> C No 350 T -> No 523 C -> No 523C -> G No 581 C -> No 581 C -> G No 698 C -> No 698 C -> G No 723 C ->No 798 C -> No 798 C -> G No 849 -> C No 849 -> G No 884 T -> G No 901-> C No 901 -> T No 943 A -> No 970 G -> A No 1004 G -> A No 1058 A -> GNo 1073 C -> T Yes 1222 A -> G No 1255 T -> C No 1311 G -> A Yes 1390 C-> T No 1396 C -> No 1474 G -> A Yes 1476 A -> T No 1593 A -> No 1593 A-> C No 1611 C -> T Yes 1629 C -> No 1629 C -> G No 1674 T -> G No 1685A -> C Yes

Variant protein R38144_PEA_(—)2_P15 (SEQ ID NO:1405) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) R38144_PEA_(—)2_T15 (SEQ IDNO:138). An alignment is given to the known protein (Putativealpha-mannosidase C20orf31 precursor (SEQ ID NO:1459)) at the end of theapplication. One or more alignments to one or more previously publishedprotein sequences are given at the end of the application. A briefdescription of the relationship of the variant protein according to thepresent invention to each such aligned protein is as follows:

Comparison report between R38144_PEA_(—)2_P15 (SEQ ID NO:1405) andCT31_HUMAN (SEQ ID NO:1459):

1. An isolated chimeric polypeptide encoding for R38144_PEA_(—)2_P15(SEQ ID NO:1405), comprising a first amino acid sequence being at least90% homologous toMPFRLLIPLGLLCALLPQHHGAPGPDGSAPDPAHYRERVKAMFYHAYDSYLENAFPFDELRPLTCDGHDTWGSFSLTLIDALDTLLILGNVSEFQRVVEVLQDSVDFDIDVNASVFETNIRVVGGLLSAHLLSKKAGVEVEAGWPCSGPLLRMAEEAARKLLPAFQTPTGMPYGTVNLLHGVNPGETPVTCTAGIGTFIVEFATLSSLTGDPVFEDVARVALMRLWESRSDIGLVGNHIDVLTGKWVAQDAGIGAGVDSYFEYLVKGAILLQDKKLMAMF LEcorresponding to amino acids 1-282 of CT31_HUMAN (SEQ ID NO:1459), whichalso corresponds to amino acids 1-282 of R38144_PEA_(—)2_P15 (SEQ IDNO:1405), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence PHWRH (SEQ ID NO: 270) corresponding to amino acids283-287 of R38144_PEA_(—)2_P15 (SEQ ID NO:1405), wherein said firstamino acid sequence and second amino acid sequence are contiguous and ina sequential order.

2. An isolated polypeptide encoding for a tail of R38144_PEA_(—)2_P15(SEQ ID NO:1405), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence PHWRH (SEQ ID NO: 270) in R38144_PEA_(—)2_P15(SEQ ID NO:1405).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein R38144_PEA_(—)2_P15 (SEQ ID NO:1405) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 1158, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein R38144_PEA_(—)2_P15 (SEQ ID NO:1405) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 1158 Amino acid mutations SNP position(s) on amino acidAlternative sequence amino acid(s) Previously known SNP? 10 G -> No 54 A-> V Yes 55 F -> L Yes 73 S -> I Yes 87 I -> No 145 P -> No 145 P -> ANo 164 A -> G No 164 A -> No 203 A -> G No 203 A -> No 211 D -> No 236 G-> No 265 V -> G No

The glycosylation sites of variant protein R38144_PEA_(—)2_P15 (SEQ IDNO:1405), as compared to the known protein Putative alpha-mannosidaseC20orf31 precursor (SEQ ID NO:1459), are described in Table 1159 (givenaccording to their position(s) on the amino acid sequence in the firstcolumn; the second column indicates whether the glycosylation site ispresent in the variant protein; and the last column indicates whetherthe position is different on the variant protein).

TABLE 1159 Glycosylation site(s) Position(s) on known amino acid PresentPosition sequence in variant protein? in variant protein? 450 no 289 no112 yes 112 90 yes 90

Variant protein R38144_PEA_(—)2_P15 (SEQ ID NO:1405) is encoded by thefollowing transcript(s): R38144_PEA_(—)2_T15 (SEQ ID NO:138), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript R38144_PEA_(—)2_T15 (SEQ ID NO:138) is shown inbold; this coding portion starts at position 91 and ends at position951. The transcript also has the following SNPs as listed in Table 1160(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinR38144_PEA_(—)2_P15 (SEQ ID NO:1405) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 1160 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 120 C -> No 251 C -> T Yes253 T -> C Yes 308 G -> T Yes 312 T -> C No 350 T -> No 523 C -> No 523C -> G No 581 C -> No 581 C -> G No 698 C -> No 698 C -> G No 723 C ->No 798 C -> No 798 C -> G No 849 -> C No 849 -> G No 884 T -> G No 901-> C No 901 -> T No 1001 T -> No 1093 C -> T Yes 1242 A -> G No 1275 T-> C No 1331 G -> A Yes 1410 C -> T No 1416 C -> No 1494 G -> A Yes 1496A -> T No 1613 A -> No 1613 A -> C No 1631 C -> T Yes 1649 C -> No 1649C -> G No 1694 T -> G No 1705 A -> C Yes

Variant protein R38144_PEA_(—)2_P19 (SEQ ID NO:1406) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) R38144_PEA_(—)2_T19 (SEQ IDNO:139). An alignment is given to the known protein (Putativealpha-mannosidase C20orf31 precursor (SEQ ID NO:1459)) at the end of theapplication. One or more alignments to one or more previously publishedprotein sequences are given at the end of the application. A briefdescription of the relationship of the variant protein according to thepresent invention to each such aligned protein is as follows:

Comparison report between R38144_PEA_(—)2_P19 (SEQ ID NO:1406) andCT31_HUMAN (SEQ ID NO:1459):

1. An isolated chimeric polypeptide encoding for R38144_PEA_(—)2_P19(SEQ ID NO:1406), comprising a first amino acid sequence being at least90% homologous toMPFRLLIPLGLLCALLPQHHGAPGPDGSAPDPAHYRERVKAMFYHAYDSYLENAFPFDELRPLTCDGHDTWGSFSLTLIDALDTLLILGNVSEFQRVVEVLQDSVDFDIDVNASVFETNIRVVGGLLSAHLLSKKAGVEVEAGWPCSGPLLRMAEEAARKLLPAFQTPTGMPYGTVNLLHGVNPGETPVTCTAGIGTFIVEFATLSSLTGDPVFEDVARVALMRLWESRSDIGLVGNHIDVLTGKWVAQDAGIGAGVDSYFEYLVKGAILLQDKKLMAMFLEYNKAIRNYTRFDDWYLWVQMYKGTVSMPVFQSLEAYWPGLQSLIGDIDNAMRTFLNYYTVWKQFGGLPEFYNIPQGYTVEKREGYPLRPELIESAMYLYRATGDPTLLELGRDAVESIEKISKVECGFATcorresponding to amino acids 1-412 of CT31_HUMAN (SEQ ID NO:1459), whichalso corresponds to amino acids 1-412 of R38144_PEA_(—)2_P19 (SEQ IDNO:1406), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence KRSRSVAQAGVQWCDHDSPQP (SEQ ID NO: 270) correspondingto amino acids 413-433 of R38144_PEA_(—)2_P19 (SEQ ID NO:1406), whereinsaid first amino acid sequence and second amino acid sequence arecontiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of R38144_PEA_(—)2_P19(SEQ ID NO:1406), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence KRSRSVAQAGVQWCDHDSPQP (SEQ ID NO: 270) inR38144_PEA_(—)2_P19 (SEQ ID NO:1406).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein R38144_PEA_(—)2_P19 (SEQ ID NO:1406) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 1161, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein R38144_PEA_(—)2_P19 (SEQ ID NO:1406) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 1161 Amino acid mutations SNP position(s) on amino acidAlternative sequence amino acid(s) Previously known SNP? 10 G -> No 54 A-> V Yes 55 F -> L Yes 73 S -> I Yes 87 I -> No 145 P -> No 145 P -> ANo 164 A -> G No 164 A -> No 203 A -> G No 203 A -> No 211 D -> No 236 G-> No 265 V -> G No 285 K -> No 294 D -> N No 305 G -> E No 323 Q -> RNo 346 F -> No

The glycosylation sites of variant protein R38144_PEA_(—)2_P19 (SEQ IDNO:1406), as compared to the known protein Putative alpha-mannosidaseC20orf31 precursor (SEQ ID NO:1459), are described in Table 1162(givenaccording to their position(s) on the amino acid sequence in the firstcolumn; the second column indicates whether the glycosylation site ispresent in the variant protein; and the last column indicates whetherthe position is different on the variant protein).

TABLE 1162 Glycosylation site(s) Position(s) on known amino acid PresentPosition sequence in variant protein? in variant protein? 450 no 289 yes289 112 yes 112 90 yes 90

Variant protein R38144_PEA_(—)2_P19 (SEQ ID NO:1406) is encoded by thefollowing transcript(s): R38144_PEA_(—)2_T19 (SEQ ID NO:139), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript R38144_PEA_(—)2_T19 (SEQ ID NO:139) is shown inbold; this coding portion starts at position 91 and ends at position1389. The transcript also has the following SNPs as listed in Table 1163(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinR38144_PEA_(—)2_P19 (SEQ ID NO:1406) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 1163 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 120 C -> No 251 C -> T Yes253 T -> C Yes 308 G -> T Yes 312 T -> C No 350 T -> No 523 C -> No 523C -> G No 581 C -> No 581 C -> G No 698 C -> No 698 C -> G No 723 C ->No 798 C -> No 798 C -> G No 849 -> C No 849 -> G No 884 T -> G No 901-> C No 901 -> T No 943 A -> No 970 G -> A No 1004 G -> A No 1058 A -> GNo 1126 T -> No 1218 C -> T Yes 1446 C -> Yes

Variant protein R38144_PEA_(—)2_P24 (SEQ ID NO:1407) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) R38144_PEA_(—)2_T27 (SEQ IDNO:140). An alignment is given to the known protein (Putativealpha-mannosidase C20orf31 precursor (SEQ ID NO:1459)) at the end of theapplication. One or more alignments to one or more previously publishedprotein sequences are given at the end of the application. A briefdescription of the relationship of the variant protein according to thepresent invention to each such aligned protein is as follows:

Comparison report between R38144_PEA_(—)2_P24 (SEQ ID NO:1407) andCT31_HUMAN (SEQ ID NO:1459):

1. An isolated chimeric polypeptide encoding for R38144_PEA_(—)2_P24(SEQ ID NO:1407), comprising a first amino acid sequence being at least90% homologous toMPFRLLIPLGLLCALLPQHHGAPGPDGSAPDPAHYRERVKAMFYHAYDSYLENAFPFDELRPLTCDGHDTWGSFSLTLIDALDTLLILGNVSEFQRVVEVLQDSVDFDIDVNASVFETNIR corresponding toamino acids 1-121 of CT31_HUMAN (SEQ ID NO:1459), which also correspondsto amino acids 1-121 of R38144_PEA_(—)2_P24 (SEQ ID NO:1407), and asecond amino acid sequence being at least 90% homologous toEYNKAIRNYTRFDDWYLWVQMYKGTVSMPVFQSLEAYWPGLQSLIGDIDNAMRTFLNYYTVWKQFGGLPEFYNIPQGYTVEKREGYPLRPELIESAMYLYRATGDPTLLELGRDAVESIEKISKVECGFATIKDLRDHKLDNRMESFFLAETVKYLYLLFDPTNFIHNNGSTFDAVITPYGECILGAGGYIFNTEAHPIDPAALHCCQRLKEEQWEVEDLMREFYSLKRSRSKFQKNTVSSGPWEPPARPGTLFSPENHDQARERKPAKQKVPLLSCPSQPFTSKLALLGQVFLDSS corresponding to amino acids 282-578 of CT31_HUMAN (SEQID NO:1459), which also corresponds to amino acids 122-418 ofR38144_PEA_(—)2_P24 (SEQ ID NO:1407), wherein said first amino acidsequence and second amino acid sequence are contiguous and in asequential order.

2. An isolated chimeric polypeptide encoding for an edge portion ofR38144_PEA_(—)2_P24 (SEQ ID NO:1407), comprising a polypeptide having alength “n”, wherein n is at least about 10 amino acids in length,optionally at least about 20 amino acids in length, preferably at leastabout 30 amino acids in length, more preferably at least about 40 aminoacids in length and most preferably at least about 50 amino acids inlength, wherein at least two amino acids comprise RE, having a structureas follows: a sequence starting from any of amino acid numbers 121−x to121; and ending at any of amino acid numbers 122+((n−2)−x), in which xvaries from 0 to n−2.

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein R38144_PEA_(—)2_P24 (SEQ ID NO:1407) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 1164, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein R38144_PEA_(—)2_P24 (SEQ ID NO:1407) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 1164 Amino acid mutations SNP position(s) on amino acidAlternative sequence amino acid(s) Previously known SNP? 10 G -> No 54 A-> V Yes 55 F -> L Yes 73 S -> I Yes 87 I -> No 125 K -> No 134 D -> NNo 145 G -> E No 163 Q -> R No 186 F -> No 266 E -> G No 277 L -> P No296 A -> T Yes 322 P -> L No 324 A -> No 350 R -> Q Yes 351 S -> C No390 K -> No 390 K -> Q No 396 L -> F Yes 402 P -> No 402 P -> A No 417 S-> A No

The glycosylation sites of variant protein R38144_PEA2_P24 (SEQ IDNO:1407), as compared to the known protein Putative alpha-mannosidaseC20orf31 precursor (SEQ ID NO:1459), are described in Table 1165 (givenaccording to their position(s) on the amino acid sequence in the firstcolumn; the second column indicates whether the glycosylation site ispresent in the variant protein; and the last column indicates whetherthe position is different on the variant protein).

TABLE 1165 Glycosylation site(s) Position(s) on known amino acid PresentPosition sequence in variant protein? in variant protein? 450 yes 290289 yes 129 112 yes 112 90 yes 90

Variant protein R38144_PEA_(—)2_P24 (SEQ ID NO:1407) is encoded by thefollowing transcript(s): R38144_PEA_(—)2_T27 (SEQ ID NO:140), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript R38144_PEA_(—)2_T27 (SEQ ID NO:140) is shown inbold; this coding portion starts at position 91 and ends at position1344. The transcript also has the following SNPs as listed in Table 1166(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinR38144_PEA_(—)2_P24 (SEQ ID NO:1407) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 1166 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 120 C -> No 251 C -> T Yes253 T -> C Yes 308 G -> T Yes 312 T -> C No 350 T -> No 463 A -> No 490G -> A No 524 G -> A No 578 A -> G No 646 T -> No 738 C -> T Yes 887 A-> G No 920 T -> C No 976 G -> A Yes 1055 C -> T No 1061 C -> No 1139 G-> A Yes 1141 A -> T No 1258 A -> No 1258 A -> C No 1276 C -> T Yes 1294C -> No 1294 C -> G No 1339 T -> G No 1350 A -> C Yes

Variant protein R38144_PEA_(—)2_P36 (SEQ ID NO:1408) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) R38144_PEA_(—)2_T10 (SEQ IDNO:136). An alignment is given to the known protein (Putativealpha-mannosidase C20orf31 precursor (SEQ ID NO:1459); SEQ ID NO:1459)at the end of the application. One or more alignments to one or morepreviously published protein sequences are given at the end of theapplication. A brief description of the relationship of the variantprotein according to the present invention to each such aligned proteinis as follows:

Comparison report between R38144_PEA_(—)2_P36 (SEQ ID NO:1408) andAAH16184 (SEQ ID NO: 1460):

1. An isolated chimeric polypeptide encoding for R38144_PEA_(—)2_P36(SEQ ID NO:1408), comprising a first amino acid sequence being at least90% homologous to MPFRLLIPLGLLCALLPQHHGAPGPDGSAPDPAHYR corresponding toamino acids 1-36 of AAH16184 (SEQ ID NO:1460), which also corresponds toamino acids 1-36 of R38144_PEA_(—)2_P36 (SEQ ID NO:1408), and a secondamino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceFWGMSQNSKEWLKCSRTAWTLILM (SEQ ID NO: 272) corresponding to amino acids37-60 of R38144_PEA_(—)2_P36 (SEQ ID NO:1408), wherein said first aminoacid sequence and second amino acid sequence are contiguous and in asequential order.

2. An isolated polypeptide encoding for a tail of R38144_PEA_(—)2_P36(SEQ ID NO:1408), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence FWGMSQNSKEWLKCSRTAWTLILM (SEQ ID NO: 272) inR38144_PEA_(—)2_P36 (SEQ ID NO:1408).

Comparison report between R38144_PEA_(—)2_P36 (SEQ ID NO:1408) andAAQ88943 (SEQ ID NO:1461):

1. An isolated chimeric polypeptide encoding for R38144_PEA_(—)2_P36(SEQ ID NO:1408), comprising a first amino acid sequence being at least90% homologous to MPFRLLIPLGLLCALLPQHHGAPGPDGSAPDPAHY corresponding toamino acids 1-35 of AAQ88943 (SEQ ID NO:1461), which also corresponds toamino acids 1-35 of R38144_PEA_(—)2_P36 (SEQ ID NO:1408), and a secondamino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceRFWGMSQNSKEWLKCSRTAWTLILM corresponding to amino acids 36-60 ofR38144_PEA_(—)2_P36 (SEQ ID NO:1408), wherein said first amino acidsequence and second amino acid sequence are contiguous and in asequential order.

2. An isolated polypeptide encoding for a tail of R38144_PEA_(—)2_P36(SEQ ID NO:1408), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence RFWGMSQNSKEWLKCSRTAWTLILM inR38144_PEA_(—)2_P36 (SEQ ID NO:1408).

Comparison report between R38144_PEA_(—)2_P36 (SEQ ID NO:1408) andCT31_HUMAN (SEQ ID NO:1459):

1. An isolated chimeric polypeptide encoding for R38144_PEA_(—)2_P36(SEQ ID NO:1408), comprising a first amino acid sequence being at least90% homologous to MPFRLLIPLGLLCALLPQHHGAPGPDGSAPDPAHYR corresponding toamino acids 1-36 of CT31_HUMAN (SEQ ID NO:1459), which also correspondsto amino acids 1-36 of R38144_PEA_(—)2_P36 (SEQ ID NO:1408), and asecond amino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceFWGMSQNSKEWLKCSRTAWTLILM (SEQ ID NO: 272) corresponding to amino acids37-60 of R38144_PEA_(—)2_P36 (SEQ ID NO:1408), wherein said first aminoacid sequence and second amino acid sequence are contiguous and in asequential order.

2. An isolated polypeptide encoding for a tail of R38144_PEA_(—)2_P36(SEQ ID NO:1408), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence FWGMSQNSKEWLKCSRTAWTLILM (SEQ ID NO: 272) inR38144_PEA_(—)2_P36 (SEQ ID NO:1408).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein R38144_PEA_(—)2_P36 (SEQ ID NO:1408) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 1167, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein R38144_PEA_(—)2_P36 (SEQ ID NO:1408) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 1167 Amino acid mutations SNP position(s) on amino acidAlternative sequence amino acid(s) Previously known SNP? 10 G -> No 37 F-> No

The glycosylation sites of variant protein R38144_PEA_(—)2_P36 (SEQ IDNO:1408), as compared to the known protein Putative alpha-mannosidaseC20orf31 precursor (SEQ ID NO:1459), are described in Table 1168 (givenaccording to their position(s) on the amino acid sequence in the firstcolumn; the second column indicates whether the glycosylation site ispresent in the variant protein; and the last column indicates whetherthe position is different on the variant protein).

TABLE 1168 Glycosylation site(s) Position(s) on known amino acidsequence Present in variant protein? 450 no 289 no 112 no 90 no

Variant protein R38144_PEA_(—)2_P36 (SEQ ID NO:1408) is encoded by thefollowing transcript(s): R38144_PEA_(—)2_T10 (SEQ ID NO:136), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript R38144_PEA_(—)2_T10 (SEQ ID NO:136) is shown inbold; this coding portion starts at position 91 and ends at position270. The transcript also has the following SNPs as listed in Table 1169(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinR38144_PEA_(—)2_P36 (SEQ ID NO:1408) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 1169 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 120 C -> No 199 T -> No 372C -> No 372 C -> G No 430 C -> No 430 C -> G No 547 C -> No 547 C -> GNo 572 C -> No 647 C -> No 647 C -> G No 698 -> C No 698 -> G No 733 T-> G No 750 -> C No 750 -> T No 792 A -> No 819 G -> A No 853 G -> A No907 A -> G No 975 T -> No 1067 C -> T Yes 1216 A -> G No 1249 T -> C No1305 G -> A Yes 1384 C -> T No 1390 C -> No 1468 G -> A Yes 1470 A -> TNo 1587 A -> No 1587 A -> C No 1605 C -> T Yes 1623 C -> No 1623 C -> GNo 1668 T -> G No 1679 A -> C Yes

As noted above, cluster R38144 features 24 segment(s), which were listedin Table 1147 above and for which the sequence(s) are given at the endof the application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster R38144_PEA_(—)2_node_(—)21 (SEQ ID NO:937) according tothe present invention is supported by 108 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R38144_PEA_(—)2_T6 (SEQ IDNO:135), R38144_PEA_(—)2_T10 (SEQ ID NO:136), R38144_PEA_(—)2_T13 (SEQID NO:137), R38144_PEA_(—)2_T15 (SEQ ID NO:138) and R38144_PEA_(—)2_T19(SEQ ID NO:139). Table 1170 below describes the starting and endingposition of this segment on each transcript.

TABLE 1170 Segment location on transcripts Segment Segment Transcriptname starting position ending position R38144_PEA_2_T6 626 792 (SEQ IDNO: 135) R38144_PEA_2_T10 475 641 (SEQ ID NO: 136) R38144_PEA_2_T13 626792 (SEQ ID NO: 137) R38144_PEA_2_T15 626 792 (SEQ ID NO: 138)R38144_PEA_2_T19 626 792 (SEQ ID NO: 139)

Segment cluster R38144_PEA_(—)2_node_(—)26 (SEQ ID NO:938) according tothe present invention is supported by 98 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R38144_PEA_(—)2_T6 (SEQ IDNO:135), R38144_PEA_(—)2_T10 (SEQ ID NO:136), R38144_PEA_(—)2_T13 (SEQID NO:137), R38144_PEA_(—)2_T15 (SEQ ID NO:138) and R38144_PEA_(—)2_T19(SEQ ID NO:139). Table 1171 below describes the starting and endingposition of this segment on each transcript.

TABLE 1171 Segment location on transcripts Segment Segment Transcriptname starting position ending position R38144_PEA_2_T6 793 934 (SEQ IDNO: 135) R38144_PEA_2_T10 642 783 (SEQ ID NO: 136) R38144_PEA_2_T13 793934 (SEQ ID NO: 137) R38144_PEA_2_T15 793 934 (SEQ ID NO: 138)R38144_PEA_2_T19 793 934 (SEQ ID NO: 139)

Segment cluster R38144_PEA_(—)2_node_(—)29 (SEQ ID NO:939) according tothe present invention is supported by 98 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R38144_PEA_(—)2_T6 (SEQ IDNO:135), R38144_PEA_(—)2_T10 (SEQ ID NO:136), R38144_PEA_(—)2_T13 (SEQID NO:137), R38144_PEA_(—)2_T19 (SEQ ID NO:139) and R38144_PEA_(—)2_T27(SEQ ID NO:140). Table 1172 below describes the starting and endingposition of this segment on each transcript.

TABLE 1172 Segment location on transcripts Segment Segment Transcriptname starting position ending position R38144_PEA_2_T6 935 1059 (SEQ IDNO: 135) R38144_PEA_2_T10 784 908 (SEQ ID NO: 136) R38144_PEA_2_T13 9351059 (SEQ ID NO: 137) R38144_PEA_2_T19 935 1059 (SEQ ID NO: 139)R38144_PEA_2_T27 455 579 (SEQ ID NO: 140)

Segment cluster R38144_PEA_(—)2_node_(—)31 (SEQ ID NO:940) according tothe present invention is supported by 95 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R38144_PEA_(—)2_T6 (SEQ IDNO:135), R38144_PEA_(—)2_T10 (SEQ ID NO:136), R38144_PEA_(—)2_T15 (SEQID NO:138), R38144_PEA_(—)2_T19 (SEQ ID NO:139) and R38144_PEA_(—)2_T27(SEQ ID NO:140). Table 1173 below describes the starting and endingposition of this segment on each transcript.

TABLE 1173 Segment location on transcripts Segment Segment Transcriptname starting position ending position R38144_PEA_2_T6 1060 1204 (SEQ IDNO: 135) R38144_PEA_2_T10 909 1053 (SEQ ID NO: 136) R38144_PEA_2_T15 9351079 (SEQ ID NO: 138) R38144_PEA_2_T19 1060 1204 (SEQ ID NO: 139)R38144_PEA_2_T27 580 724 (SEQ ID NO: 140)

Segment cluster R38144_PEA_(—)2_node_(—)46 (SEQ ID NO:941) according tothe present invention is supported by 147 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R38144_PEA_(—)2_T6 (SEQ IDNO:135), R38144_PEA_(—)2_T10 (SEQ ID NO:136), R38144_PEA_(—)2_T13 (SEQID NO:137), R38144_PEA_(—)2_T15 (SEQ ID NO:138) and R38144_PEA_(—)2_T27(SEQ ID NO:140). Table 1174 below describes the starting and endingposition of this segment on each transcript.

TABLE 1174 Segment location on transcripts Segment Segment Transcriptname starting position ending position R38144_PEA_2_T6 1373 1544 (SEQ IDNO: 135) R38144_PEA_2_T10 1197 1368 (SEQ ID NO: 136) R38144_PEA_2_T131203 1374 (SEQ ID NO: 137) R38144_PEA_2_T15 1223 1394 (SEQ ID NO: 138)R38144_PEA_2_T27 868 1039 (SEQ ID NO: 140)

Segment cluster R38144_PEA_(—)2_node_(—)47 (SEQ ID NO:942) according tothe present invention is supported by 147 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R38144_PEA_(—)2_T6 (SEQ IDNO:135), R38144_PEA_(—)2_T10 (SEQ ID NO:136), R38144_PEA_(—)2_T13 (SEQID NO:137), R38144_PEA_(—)2_T15 (SEQ ID NO:138) and R38144_PEA_(—)2_T27(SEQ ID NO:140). Table 1175 below describes the starting and endingposition of this segment on each transcript.

TABLE 1175 Segment location on transcripts Segment Segment startingending Transcript name position position R38144_PEA_2_T6 (SEQ ID NO:135) 1545 1919 R38144_PEA_2_T10 (SEQ ID NO: 136) 1369 1743R38144_PEA_2_T13 (SEQ ID NO: 137) 1375 1749 R38144_PEA_2_T15 (SEQ ID NO:138) 1395 1769 R38144_PEA_2_T27 (SEQ ID NO: 140) 1040 1414

Segment cluster R38144_PEA_(—)2_node_(—)49 (SEQ ID NO:943) according tothe present invention is supported by 1 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R38144_PEA_(—)2_T19 (SEQ IDNO:139). Table 1176 below describes the starting and ending position ofthis segment on each transcript.

TABLE 1176 Segment location on transcripts Segment Segment startingending Transcript name position position R38144_PEA_2_T19 (SEQ ID NO:139) 1327 1448

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster R38144_PEA_(—)2_node_(—)0 (SEQ ID NO:944) according tothe present invention is supported by 101 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R38144_PEA_(—)2_T6 (SEQ IDNO:135), R38144_PEA_(—)2_T10 (SEQ ID NO:136), R38144_PEA_(—)2_T13 (SEQID NO:137), R38144_PEA_(—)2_T15 (SEQ ID NO:138), R38144_PEA_(—)2_T19(SEQ ID NO:139) and R38144_PEA_(—)2_T27 (SEQ ID NO:140). Table 1177below describes the starting and ending position of this segment on eachtranscript.

TABLE 1177 Segment location on transcripts Segment Segment startingending Transcript name position position R38144_PEA_2_T6 (SEQ ID NO:135) 1 105 R38144_PEA_2_T10 (SEQ ID NO: 136) 1 105 R38144_PEA_2_T13 (SEQID NO: 137) 1 105 R38144_PEA_2_T15 (SEQ ID NO: 138) 1 105R38144_PEA_2_T19 (SEQ ID NO: 139) 1 105 R38144_PEA_2_T27 (SEQ ID NO:140) 1 105

Segment cluster R38144_PEA_(—)2_node_(—)1 (SEQ ID NO:945) according tothe present invention is supported by 105 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R38144_PEA_(—)2_T6 (SEQ IDNO:135), R38144_PEA_(—)2_T10 (SEQ ID NO:136), R38144_PEA_(—)2_T13 (SEQID NO:137), R38144_PEA_(—)2_T15 (SEQ ID NO:138), R38144_PEA_(—)2_T19(SEQ ID NO:139) and R38144_PEA_(—)2_T27 (SEQ ID NO:140). Table 1178below describes the starting and ending position of this segment on eachtranscript.

TABLE 1178 Segment location on transcripts Segment Segment startingending Transcript name position position R38144_PEA_2_T6 (SEQ ID NO:135) 106 197 R38144_PEA_2_T10 (SEQ ID NO: 136) 106 197 R38144_PEA_2_T13(SEQ ID NO: 137) 106 197 R38144_PEA_2_T15 (SEQ ID NO: 138) 106 197R38144_PEA_2_T19 (SEQ ID NO: 139) 106 197 R38144_PEA_2_T27 (SEQ ID NO:140) 106 197

Segment cluster R38144_PEA_(—)2_node_(—)4 (SEQ ID NO:946) according tothe present invention is supported by 107 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R38144_PEA_(—)2_T6 (SEQ IDNO:135), R38144_PEA_(—)2_T13 (SEQ ID NO:137), R38144_PEA_(—)2 T15 (SEQID NO:138), R38144_PEA_(—)2_T19 (SEQ ID NO:139) and R38144_PEA_(—)2_T27(SEQ ID NO:140). Table 1179 below describes the starting and endingposition of this segment on each transcript.

TABLE 1179 Segment location on transcripts Segment Segment startingending Transcript name position position R38144_PEA_2_T6 (SEQ ID NO:135) 198 299 R38144_PEA_2_T13 (SEQ ID NO: 137) 198 299 R38144_PEA_2_T15(SEQ ID NO: 138) 198 299 R38144_PEA_2_T19 (SEQ ID NO: 139) 198 299R38144_PEA_2_T27 (SEQ ID NO: 140) 198 299

Segment cluster R38144_PEA_(—)2_node_(—)5 (SEQ ID NO:947) according tothe present invention can be found in the following transcript(s):R38144_PEA_(—)2_T6 (SEQ ID NO:135), R38144_PEA_(—)2_T13 (SEQ ID NO:137),R38144_PEA_(—)2_T15 (SEQ ID NO:138), R38144_PEA_(—)2_T19 (SEQ ID NO:139)and R38144_PEA_(—)2_T27 (SEQ ID NO:140). Table 1180 below describes thestarting and ending position of this segment on each transcript.

TABLE 1180 Segment location on transcripts Segment Segment startingending Transcript name position position R38144_PEA_2_T6 (SEQ ID NO:135) 300 308 R38144_PEA_2_T13 (SEQ ID NO: 137) 300 308 R38144_PEA_2_T15(SEQ ID NO: 138) 300 308 R38144_PEA_2_T19 (SEQ ID NO: 139) 300 308R38144_PEA_2_T27 (SEQ ID NO: 140) 300 308

Segment cluster R38144_PEA_(—)2_node_(—)7 (SEQ ID NO:948) according tothe present invention is supported by 92 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R38144_PEA_(—)2_T6 (SEQ IDNO:135), R38144_PEA_(—)2_T13 (SEQ ID NO:137), R38144_PEA_(—)2_T15 (SEQID NO:138), R38144_PEA_(—)2_T19 (SEQ ID NO:139) and R38144_PEA_(—)2_T27(SEQ ID NO:140). Table 1181 below describes the starting and endingposition of this segment on each transcript.

TABLE 1181 Segment location on transcripts Segment Segment startingending Transcript name position position R38144_PEA_2_T6 (SEQ ID NO:135) 309 348 R38144_PEA_2_T13 (SEQ ID NO: 137) 309 348 R38144_PEA_2_T15(SEQ ID NO: 138) 309 348 R38144_PEA_2_T19 (SEQ ID NO: 139) 309 348R38144_PEA_2_T27 (SEQ ID NO: 140) 309 348

Segment cluster R38144_PEA_(—)2_node_(—)11 (SEQ ID NO:949) according tothe present invention is supported by 106 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R38144_PEA_(—)2_T6 (SEQ IDNO:135), R38144_PEA_(—)2_T10 (SEQ ID NO:136), R38144_PEA_(—)2_T13 (SEQID NO:137), R38144_PEA_(—)2_T15 (SEQ ID NO:138), R38144_PEA_(—)2_T19(SEQ ID NO:139) and R38144_PEA_(—)2_T27 (SEQ ID NO:140). Table 1182below describes the starting and ending position of this segment on eachtranscript.

TABLE 1182 Segment location on transcripts Segment Segment startingending Transcript name position position R38144_PEA_2_T6 (SEQ ID NO:135) 349 454 R38144_PEA_2_T10 (SEQ ID NO: 136) 198 303 R38144_PEA_2_T13(SEQ ID NO: 137) 349 454 R38144_PEA_2_T15 (SEQ ID NO: 138) 349 454R38144_PEA_2_T19 (SEQ ID NO: 139) 349 454 R38144_PEA_2_T27 (SEQ ID NO:140) 349 454

Segment cluster R38144_PEA_(—)2_node_(—)14 (SEQ ID NO:950) according tothe present invention can be found in the following transcript(s):R38144_PEA_(—)2_T6 (SEQ ID NO:135), R38144_PEA_(—)2_T10 (SEQ ID NO:136),R38144_PEA_(—)2_T13 (SEQ ID NO:137), R38144_PEA_(—)2_T15 (SEQ ID NO:138)and R38144_PEA_(—)2_T19 (SEQ ID NO:139). Table 1183 below describes thestarting and ending position of this segment on each transcript.

TABLE 1183 Segment location on transcripts Segment Segment startingending Transcript name position position R38144_PEA_2_T6 (SEQ ID NO:135) 455 460 R38144_PEA_2_T10 (SEQ ID NO: 136) 304 309 R38144_PEA_2_T13(SEQ ID NO: 137) 455 460 R38144_PEA_2_T15 (SEQ ID NO: 138) 455 460R38144_PEA_2_T19 (SEQ ID NO: 139) 455 460

Segment cluster R38144_PEA_node_(—)11 (SEQ ID NO:951) according to thepresent invention is supported by 105 libraries. The number of librarieswas determined as previously described. This segment can be found in thefollowing transcript(s): R38144_PEA_(—)2_T6 (SEQ ID NO:135),R38144_PEA_(—)2_T10 (SEQ ID NO:136), R38144_PEA_(—)2_T13 (SEQ IDNO:137), R38144_PEA_(—)2_T15 (SEQ ID NO:138) and R38144_PEA_(—)2_T19(SEQ ID NO:139). Table 1184 below describes the starting and endingposition of this segment on each transcript.

TABLE 1184 Segment location on transcripts Segment Segment startingending Transcript name position position R38144_PEA_2_T6 (SEQ ID NO:135) 461 487 R38144_PEA_2_T10 (SEQ ID NO: 136) 310 336 R38144_PEA_2_T13(SEQ ID NO: 137) 461 487 R38144_PEA_2_T15 (SEQ ID NO: 138) 461 487R38144_PEA_2_T19 (SEQ ID NO: 139) 461 487

Segment cluster R38144_PEA_(—)2_node_l 6 (SEQ ID NO:952) according tothe present invention is supported by 106 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R38144_PEA_(—)2_T6 (SEQ IDNO:135), R38144_PEA_(—)2_T10 (SEQ ID NO:136), R38144_PEA_(—)2_T13 (SEQID NO:137), R38144_PEA_(—)2_T15 (SEQ ID NO:138) and R38144_PEA_(—)2_T19(SEQ ID NO:139). Table 1185 below describes the starting and endingposition of this segment on each transcript.

TABLE 1185 Segment location on transcripts Segment Segment startingending Transcript name position position R38144_PEA_2_T6 (SEQ ID NO:135) 488 580 R38144_PEA_2_T10 (SEQ ID NO: 136) 337 429 R38144_PEA_2_T13(SEQ ID NO: 137) 488 580 R38144_PEA_2_T15 (SEQ ID NO: 138) 488 580R38144_PEA_2_T19 (SEQ ID NO: 139) 488 580

Segment cluster R38144_PEA_(—)2_node_(—)19 (SEQ ID NO:953) according tothe present invention is supported by 93 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R38144_PEA_(—)2_T6 (SEQ IDNO:135), R38144_PEA_(—)2_T10 (SEQ ID NO:136), R38144_PEA_(—)2_T13 (SEQID NO:137), R38144_PEA_(—)2_T15 (SEQ ID NO:138) and R38144_PEA_(—)2_T19(SEQ ID NO:139). Table 1186 below describes the starting and endingposition of this segment on each transcript.

TABLE 1186 Segment location on transcripts Segment Segment startingending Transcript name position position R38144_PEA_2_T6 (SEQ ID NO:135) 581 615 R38144_PEA_2_T10 (SEQ ID NO: 136) 430 464 R38144_PEA_2_T13(SEQ ID NO: 137) 581 615 R38144_PEA_2_T15 (SEQ ID NO: 138) 581 615R38144_PEA_2_T19 (SEQ ID NO: 139) 581 615

Segment cluster R38144_PEA_(—)2_node_(—)20 (SEQ ID NO:954) according tothe present invention can be found in the following transcript(s):R38144_PEA_(—)2_T6 (SEQ ID NO:135), R38144_PEA_(—)2_T10 (SEQ ID NO:136),R38144_PEA_(—)2_T13 (SEQ ID NO:137), R38144_PEA_(—)2_T15 (SEQ ID NO:138)and R38144_PEA_(—)2_T19 (SEQ ID NO:139). Table 1187 below describes thestarting and ending position of this segment on each transcript.

TABLE 1187 Segment location on transcripts Segment Segment startingending Transcript name position position R38144_PEA_2_T6 (SEQ ID NO:135) 616 625 R38144_PEA_2_T10 (SEQ ID NO: 136) 465 474 R38144_PEA_2_T13(SEQ ID NO: 137) 616 625 R38144_PEA_2_T15 (SEQ ID NO: 138) 616 625R38144_PEA_2_T19 (SEQ ID NO: 139) 616 625

Segment cluster R38144_PEA_(—)2_node_(—)36 (SEQ ID NO:955) according tothe present invention is supported by 95 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R38144_PEA_(—)2_T6 (SEQ IDNO:135), R38144_PEA_(—)2_T10 (SEQ ID NO:136), R38144_PEA_(—)2_T13 (SEQID NO:137), R38144_PEA_(—)2_T15 (SEQ ID NO:138), R38144_PEA_(—)2_T19(SEQ ID NO:139) and R38144_PEA_(—)2_T27 (SEQ ID NO:140). Table 1188below describes the starting and ending position of this segment on eachtranscript.

TABLE 1188 Segment location on transcripts Segment Segment startingending Transcript name position position R38144_PEA_2_T6 (SEQ ID NO:135) 1205 1293 R38144_PEA_2_T10 (SEQ ID NO: 136) 1054 1142R38144_PEA_2_T13 (SEQ ID NO: 137) 1060 1148 R38144_PEA_2_T15 (SEQ ID NO:138) 1080 1168 R38144_PEA_2_T19 (SEQ ID NO: 139) 1205 1293R38144_PEA_2_T27 (SEQ ID NO: 140) 725 813

Segment cluster R38144_PEA_(—)2_node_(—)37 (SEQ ID NO:956) according tothe present invention is supported by 97 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R38144_PEA_(—)2_T6 (SEQ IDNO:135), R38144_PEA_(—)2_T10 (SEQ ID NO:136), R38144_PEA_(—)2_T13 (SEQID NO:137), R38144_PEA_(—)2_T15 (SEQ ID NO:138), R38144_PEA_(—)2_T19(SEQ ID NO:139) and R38144_PEA_(—)2_T27 (SEQ ID NO:140). Table 1213below describes the starting and ending position of this segment on eachtranscript.

TABLE 1213 Segment location on transcripts Segment Segment startingending Transcript name position position R38144_PEA_2_T6 (SEQ ID NO:135) 1294 1326 R38144_PEA_2_T10 (SEQ ID NO: 136) 1143 1175R38144_PEA_2_T13 (SEQ ID NO: 137) 1149 1181 R38144_PEA_2_T15 (SEQ ID NO:138) 1169 1201 R38144_PEA_2_T19 (SEQ ID NO: 139) 1294 1326R38144_PEA_2_T27 (SEQ ID NO: 140) 814 846

Segment cluster R38144_PEA_(—)2_node_(—)43 (SEQ ID NO:957) according tothe present invention can be found in the following transcript(s):R38144_PEA_(—)2_T6 (SEQ ID NO:135). Table 1189 below describes thestarting and ending position of this segment on each transcript.

TABLE 1189 Segment location on transcripts Segment Segment startingending Transcript name position position R38144_PEA_2_T6 (SEQ ID NO:135) 1327 1346

Segment cluster R38144_PEA_(—)2_node_(—)44 (SEQ ID NO:958) according tothe present invention can be found in the following transcript(s):R38144_PEA_(—)2_T6 (SEQ ID NO:135). Table 1190 below describes thestarting and ending position of this segment on each transcript.

TABLE 1190 Segment location on transcripts Segment Segment endingTranscript name starting position position R38144_PEA_2_T6 (SEQ ID NO:135) 1347 1351

Segment cluster R38144_PEA_(—)2_node_(—)45 (SEQ ID NO:959) according tothe present invention can be found in the following transcript(s):R38144_PEA_(—)2_T6 (SEQ ID NO:135), R38144_PEA_(—)2_T10 (SEQ ID NO:136),R38144_PEA_(—)2_T13 (SEQ ID NO:137), R38144_PEA_(—)2_T15 (SEQ ID NO:138)and R38144_PEA_(—)2_T27 (SEQ ID NO:140). Table 1191 below describes thestarting and ending position of this segment on each transcript.

TABLE 1191 Segment location on transcripts Segment Segment startingending Transcript name position position R38144_PEA_2_T6 (SEQ ID NO:135) 1352 1372 R38144_PEA_2_T10 (SEQ ID NO: 136) 1176 1196R38144_PEA_2_T13 (SEQ ID NO: 137) 1182 1202 R38144_PEA_2_T15 (SEQ ID NO:138) 1202 1222 R38144_PEA_2_T27 (SEQ ID NO: 140) 847 867

Segment cluster R38144_PEA_(—)2_node_(—)51 (SEQ ID NO:960) according tothe present invention is supported by 1 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R38144_PEA_(—)2_T19 (SEQ IDNO:139). Table 1192 below describes the starting and ending position ofthis segment on each transcript.

TABLE 1192 Segment location on transcripts Segment Segment startingending Transcript name position position R38144_PEA_2_T19 (SEQ ID NO:139) 1449 1522Variant protein alignment to the previously known protein:

Description for Cluster HUMOSTRO

Cluster HUMOSTRO features 3 transcript(s) and 30 segment(s) of interest,the names for which are given in Tables 1193 and 1194, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 1195.

TABLE 1193 Transcripts of interest Transcript Name Sequence ID No.HUMOSTRO_PEA_1_PEA_1_T14 141 HUMOSTRO_PEA_1_PEA_1_T16 142HUMOSTRO_PEA_1_PEA_1_T30 143

TABLE 1194 Segments of interest Segment Name Sequence ID No.HUMOSTRO_PEA_1_PEA_1_node_0 961 HUMOSTRO_PEA_1_PEA_1_node_10 962HUMOSTRO_PEA_1_PEA_1_node_16 963 HUMOSTRO_PEA_1_PEA_1_node_23 964HUMOSTRO_PEA_1_PEA_1_node_31 965 HUMOSTRO_PEA_1_PEA_1_node_43 966HUMOSTRO_PEA_1_PEA_1_node_3 967 HUMOSTRO_PEA_1_PEA_1_node_5 968HUMOSTRO_PEA_1_PEA_1_node_7 969 HUMOSTRO_PEA_1_PEA_1_node_8 970HUMOSTRO_PEA_1_PEA_1_node_15 971 HUMOSTRO_PEA_1_PEA_1_node_17 972HUMOSTRO_PEA_1_PEA_1_node_20 973 HUMOSTRO_PEA_1_PEA_1_node_21 974HUMOSTRO_PEA_1_PEA_1_node_22 975 HUMOSTRO_PEA_1_PEA_1_node_24 976HUMOSTRO_PEA_1_PEA_1_node_26 977 HUMOSTRO_PEA_1_PEA_1_node_27 978HUMOSTRO_PEA_1_PEA_1_node_28 979 HUMOSTRO_PEA_1_PEA_1_node_29 980HUMOSTRO_PEA_1_PEA_1_node_30 981 HUMOSTRO_PEA_1_PEA_1_node_32 982HUMOSTRO_PEA_1_PEA_1_node_34 983 HUMOSTRO_PEA_1_PEA_1_node_36 984HUMOSTRO_PEA_1_PEA_1_node_37 985 HUMOSTRO_PEA_1_PEA_1_node_38 986HUMOSTRO_PEA_1_PEA_1_node_39 987 HUMOSTRO_PEA_1_PEA_1_node_40 988HUMOSTRO_PEA_1_PEA_1_node_41 989 HUMOSTRO_PEA_1_PEA_1_node_42 990

TABLE 1195 Proteins of interest Sequence Protein Name ID No.Corresponding Transcript(s) HUMOSTRO_PEA_1_PEA_1_P21 1627HUMOSTRO_PEA_1_PEA_1_T14 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_P25 1628HUMOSTRO_PEA_1_PEA_1_T16 (SEQ ID NO: 142) HUMOSTRO_PEA_1_PEA_1_P30 1629HUMOSTRO_PEA_1_PEA_1_T30 (SEQ ID NO: 143)

These sequences are variants of the known protein Osteopontin precursor(SwissProt accession identifier OSTP_HUMAN; known also according to thesynonyms Bone sialoprotein 1; Urinary stone protein; Secretedphosphoprotein 1; SPP-1; Nephropontin; Uropontin), SEQ ID NO:1462,referred to herein as the previously known protein.

Protein Osteopontin precursor (SEQ ID NO:1462) is known or believed tohave the following function(s): Binds tightly to hydroxyapatite. Appearsto form an integral part of the mineralized matrix. Probably importantto cell-matrix interaction. Acts as a cytokine involved in enhancingproduction of interferon-gamma and interleukin-12 and reducingproduction of interleukin-10 and is essential in the pathway that leadsto type I immunity (By similarity). The sequence for protein Osteopontinprecursor is given at the end of the application, as “Osteopontinprecursor amino acid sequence”. Known polymorphisms for this sequenceare as shown in Table 1196.

TABLE 1196 Amino acid mutations for Known Protein SNP position(s) onamino acid sequence Comment 301 R -> H (in dbSNP: 4660). /FTId =VAR_014717. 188 D -> H 237 T -> A 275-278 SHEF -> GNSL

Protein Osteopontin precursor (SEQ ID NO:1462) localization is believedto be Secreted.

The previously known protein also has the following indication(s) and/orpotential therapeutic use(s): Regeneration, bone. It has beeninvestigated for clinical/therapeutic use in humans, for example as atarget for an antibody or small molecule, and/or as a directtherapeutic; available information related to these investigations is asfollows. Potential pharmaceutically related or therapeutically relatedactivity or activities of the previously known protein are as follows:Bone formation stimulant. A therapeutic role for a protein representedby the cluster has been predicted. The cluster was assigned this fieldbecause there was information in the drug database or the publicdatabases (e.g., described herein above) that this protein, or partthereof, is used or can be used for a potential therapeutic indication:Musculoskeletal.

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: ossification; anti-apoptosis;inflammatory response; cell-matrix adhesion; cell-cell signaling, whichare annotation(s) related to Biological Process; defense/immunityprotein; cytokine; integrin ligand; protein binding; growth factor;apoptosis inhibitor, which are annotation(s) related to MolecularFunction; and extracellular matrix, which are annotation(s) related toCellular Component.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

Cluster HUMOSTRO can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 46 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 46 and Table 1197. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions:epithelial malignant tumors, a mixture of malignant tumors fromdifferent tissues, lung malignant tumors, breast malignant tumors,ovarian carcinoma and skin malignancies.

TABLE 1197 Normal tissue distribution Name of Tissue Number Adrenal 4Bladder 0 Bone 897 Brain 506 Colon 69 Epithelial 548 General 484 headand neck 50 Kidney 5618 Liver 4 Lung 10 lymph nodes 75 Breast 8 bonemarrow 62 Muscle 37 Ovary 40 Pancreas 845 Prostate 48 Skin 13 Stomach 73Thyroid 0 Uterus 168

TABLE 1198 P values and ratios for expression in cancerous tissue Nameof Tissue P1 P2 SP1 R3 SP2 R4 Adrenal 1.5e−01 2.1e−01 2.0e−02 4.64.4e−02 3.6 Bladder 1.2e−01 9.2e−02 5.7e−02 4.1 2.1e−02 4.3 Bone 4.9e−017.4e−01 4.1e−06 0.6 5.4e−01 0.4 Brain 6.6e−01 7.0e−01 3.2e−01 0.6 1 0.4Colon 2.7e−01 4.0e−01 3.1e−01 1.5 5.2e−01 1.1 Epithelial 2.0e−07 1.6e−039.8e−01 0.7 1 0.5 General 1.2e−06 1.2e−02 7.9e−01 0.8 1 0.6 head andneck 3.4e−01 5.0e−01 1 0.7 1 0.7 Kidney 6.8e−01 7.4e−01 1 0.2 1 0.1Liver 3.3e−01 2.5e−01 1 1.8 2.3e−01 2.6 Lung 4.3e−04 4.6e−03 2.1e−3015.0 2.8e−27 23.5 lymph nodes 6.7e−01 8.7e−01 8.1e−01 0.7 9.9e−01 0.3Breast 2.3e−01 3.0e−01 1.9e−04 6.2 4.1e−03 4.3 bone marrow 7.5e−017.8e−01 1 0.3 2.0e−02 1.2 Muscle 4.0e−02 7.5e−02 1.1e−01 4.6 5.1e−01 1.5Ovary 4.7e−02 8.4e−02 1.9e−05 5.4 8.3e−04 3.7 Pancreas 5.0e−02 3.3e−01 10.3 1 0.2 Prostate 8.5e−01 9.0e−01 8.9e−01 0.7 9.5e−01 0.6 Skin 1.6e−011.6^(e)−01 1.2e−10 12.6 5.2e−04 4.1 Stomach 1.5e−01 6.3^(e)−01 5.0e−011.2 9.4e−01 0.6 Thyroid 2.9e−01 2.9e−01 5.9e−02 2.0 5.9e−02 2.0 Uterus6.1e−02 5.7^(e)−01 1.1e−01 1.3 7.0e−01 0.7

As noted above, cluster HUMOSTRO features 3 transcript(s), which werelisted in Table 1193 above. These transcript(s) encode for protein(s)which are variant(s) of protein Osteopontin precursor (SEQ ID NO:1462).A description of each variant protein according to the present inventionis now provided.

Variant protein HUMOSTRO_PEA_(—)1_PEA_(—)1_P21 (SEQ ID NO:1627)according to the present invention has an amino acid sequence as givenat the end of the application; it is encoded by transcript(s)HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141). An alignment is given tothe known protein (Osteopontin precursor (SEQ ID NO:1462)) at the end ofthe application. One or more alignments to one or more previouslypublished protein sequences are given at the end of the application. Abrief description of the relationship of the variant protein accordingto the present invention to each such aligned protein is as follows:

Comparison report between HUMOSTRO_PEA_(—)1_PEA_(—)1_P21 (SEQ IDNO:1627) and OSTP_HUMAN (SEQ ID NO:1462):

1. An isolated chimeric polypeptide encoding forHUMOSTRO_PEA_(—)1_PEA_(—)1_P21 (SEQ ID NO:1627), comprising a firstamino acid sequence being at least 90% homologous toMRIAVICFCLLGITCAIPVKQADSGSSEEKQLYNKYPDAVATWLNPDPSQKQNLLAPQ correspondingto amino acids 1-58 of OSTP_HUMAN (SEQ ID NO:1462), which alsocorresponds to amino acids 1-58 of HUMOSTRO_PEA_(—)1_PEA_(—)1_P21 (SEQID NO:1627), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence VFLNFS (SEQ ID NO: 261) corresponding to amino acids59-64 of HUMOSTRO_PEA_(—)1_PEA_(—)1_P21 (SEQ ID NO:1627), wherein saidfirst amino acid sequence and second amino acid sequence are contiguousand in a sequential order.

2. An isolated polypeptide encoding for a tail ofHUMOSTRO_PEA_(—)1_PEA_(—)1_P21 (SEQ ID NO:1627), comprising apolypeptide being at least 70%, optionally at least about 80%,preferably at least about 85%, more preferably at least about 90% andmost preferably at least about 95% homologous to the sequence VFLNFS(SEQ ID NO: 261) in HUMOSTRO_PEA_(—)1_PEA_(—)1_P21 (SEQ ID NO:1627).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted because ofmanual inspection of known protein localization and/or gene structure.

Variant protein HUMOSTRO_PEA_(—)1_PEA_(—)1_P21 (SEQ ID NO:1627) also hasthe following non-silent SNPs (Single Nucleotide Polymorphisms) aslisted in Table 1199, (given according to their position(s) on the aminoacid sequence, with the alternative amino acid(s) listed; the lastcolumn indicates whether the SNP is known or not; the presence of knownSNPs in variant protein HUMOSTRO_PEA_(—)1_PEA_(—)1_P21 (SEQ ID NO:1627)sequence provides support for the deduced sequence of this variantprotein according to the present invention).

TABLE 1199 Amino acid mutations SNP position(s) on amino acidAlternative sequence amino acid(s) Previously known SNP? 7 C -> W No 31Q -> R No 47 D -> V Yes 49 S -> P No

The glycosylation sites of variant proteinHUMOSTRO_PEA_(—)1_PEA_(—)1_P21 (SEQ ID NO:1627), as compared to theknown protein Osteopontin precursor (SEQ ID NO:1462), are described inTable 1200 (given according to their position(s) on the amino acidsequence in the first column; the second column indicates whether theglycosylation site is present in the variant protein; and the lastcolumn indicates whether the position is different on the variantprotein).

TABLE 1200 Glycosylation site(s) Position(s) on known amino acidsequence Present in variant protein? 79 no 106 no

Variant protein HUMOSTRO_PEA_(—)1_PEA_(—)1_P21 (SEQ ID NO:1627) isencoded by the following transcript(s): HUMOSTRO_PEA_(—)1_PEA_(—)1_T14(SEQ ID NO:141), for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcriptHUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141) is shown in bold; thiscoding portion starts at position 199 and ends at position 390. Thetranscript also has the following SNPs as listed in Table 1201 (givenaccording to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinHUMOSTRO_PEA_(—)1_PEA_(—)1_P21 (SEQ ID NO:1627) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 1201 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 136 A -> G Yes 154 T -> No159 G -> T Yes 219 C -> G No 274 -> G No 290 A -> G No 338 A -> T Yes343 T -> C No 413 G -> C Yes 707 C -> T Yes 708 C -> A Yes 715 A -> GYes 730 A -> C No 730 A -> G No 746 T -> C Yes 767 C -> T No 779 G -> AYes 866 -> G No 869 T -> No 889 -> A No 891 A -> C No 891 A -> G No 905T -> C No 910 -> G No 910 -> T No 997 A -> G No 1026 G -> C No 1042 -> GNo 1042 -> T No 1071 A -> No 1071 A -> C No 1098 A -> No 1105 C -> T No1124 -> G No 1135 G -> A Yes 1136 T -> No 1136 T -> G No 1173 A -> C No1173 A -> G No 1179 A -> G No 1214 C -> T Yes 1246 T -> No 1246 T -> ANo 1359 A -> No 1359 A -> G No 1362 T -> No 1365 C -> T Yes 1366 G -> AYes 1408 A -> C No 1418 A -> C No 1433 A -> C No 1456 A -> C No 1524 T-> A No 1524 T -> C No 1547 A -> G Yes 1553 T -> No 1574 -> G No 1654 A-> C Yes 1691 A -> G No 1703 A -> C Yes 1755 A -> C No 1764 T -> No

Variant protein HUMOSTRO_PEA_(—)1_PEA_(—)1_P25 (SEQ ID NO:1628)according to the present invention has an amino acid sequence as givenat the end of the application; it is encoded by transcript(s)HUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). An alignment is given tothe known protein (Osteopontin precursor (SEQ ID NO:1462)) at the end ofthe application. One or more alignments to one or more previouslypublished protein sequences are given at the end of the application. Abrief description of the relationship of the variant protein accordingto the present invention to each such aligned protein is as follows:

Comparison report between HUMOSTRO_PEA_(—)1_PEA_(—)1_P25 (SEQ IDNO:1628) and OSTP_HUMAN (SEQ ID NO:1462):

1. An isolated chimeric polypeptide encoding forHUMOSTRO_PEA_(—)1_PEA_(—)1_P25 (SEQ ID NO:1628), comprising a firstamino acid sequence being at least 90% homologous toMRIAVICFCLLGITCAIPVKQADSGSSEEKQ corresponding to amino acids 1-31 ofOSTP_HUMAN (SEQ ID NO:1462), which also corresponds to amino acids 1-31of HUMOSTRO_PEA_(—)1_PEA_(—)1_P25 (SEQ ID NO:1628), and a second aminoacid sequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence H corresponding to aminoacids 32-32 of HUMOSTRO_PEA_(—)1_PEA_(—)1_P25 (SEQ ID NO:1628), whereinsaid first amino acid sequence and second amino acid sequence arecontiguous and in a sequential order.

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein HUMOSTRO_PEA_(—)1_PEA_(—)1_P25 (SEQ ID NO:1628) also hasthe following non-silent SNPs (Single Nucleotide Polymorphisms) aslisted in Table 1202, (given according to their position(s) on the aminoacid sequence, with the alternative amino acid(s) listed; the lastcolumn indicates whether the SNP is known or not; the presence of knownSNPs in variant protein HUMOSTRO_PEA_(—)1_PEA_(—)1_P25 (SEQ ID NO:1628)sequence provides support for the deduced sequence of this variantprotein according to the present invention).

TABLE 1202 Amino acid mutations SNP position(s) on amino acidAlternative sequence amino acid(s) Previously known SNP? 7 C -> W No 31Q -> R No

The glycosylation sites of variant proteinHUMOSTRO_PEA_(—)1_PEA_(—)1_P25 (SEQ ID NO:1628), as compared to theknown protein Osteopontin precursor (SEQ ID NO:1462), are described inTable 1203 (given according to their position(s) on the amino acidsequence in the first column; the second column indicates whether theglycosylation site is present in the variant protein; and the lastcolumn indicates whether the position is different on the variantprotein).

TABLE 1203 Glycosylation site(s) Position(s) on known amino acidsequence Present in variant protein? 79 no 106 no

Variant protein HUMOSTRO_PEA_(—)1_PEA_(—)1_P25 (SEQ ID NO:1628) isencoded by the following transcript(s): HUMOSTRO_PEA_(—)1_PEA_(—)1_T16(SEQ ID NO:142), for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcriptHUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142) is shown in bold; thiscoding portion starts at position 199 and ends at position 294. Thetranscript also has the following SNPs as listed in Table 1204 (givenaccording to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinHUMOSTRO_PEA_(—)1_PEA_(—)1_P25 (SEQ ID NO:1628) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 1204 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 136 A -> G Yes 154 T -> No159 G -> T Yes 219 C -> G No 274 -> G No 290 A -> G No 419 C -> T Yes454 G -> C Yes 527 A -> T Yes 532 T -> C No 630 C -> T Yes 631 C -> AYes 638 A -> G Yes 653 A -> C No 653 A -> G No 669 T -> C Yes 690 C -> TNo 702 G -> A Yes 789 -> G No 792 T -> No 812 -> A No 814 A -> C No 814A -> G No 828 T -> C No 833 -> G No 833 -> T No 920 A -> G No 949 G -> CNo 965 -> G No 965 -> T No 994 A -> No 994 A -> C No 1021 A -> No 1028 C-> T No 1047 -> G No 1058 G -> A Yes 1059 T -> No 1059 T -> G No 1096 A-> C No 1096 A -> G No 1102 A -> G No 1137 C -> T Yes 1169 T -> No 1169T -> A No 1282 A -> No 1282 A -> G No 1285 T -> No 1288 C -> T Yes 1289G -> A Yes 1331 A -> C No 1341 A -> C No 1356 A -> C No 1379 A -> C No1447 T -> A No 1447 T -> C No 1470 A -> G Yes 1476 T -> No 1497 -> G No1577 A -> C Yes 1614 A -> G No 1626 A -> C Yes 1678 A -> C No 1687 T ->No

Variant protein HUMOSTRO_PEA_(—)1_PEA_(—)1_P30 (SEQ ID NO:1629)according to the present invention has an amino acid sequence as givenat the end of the application; it is encoded by transcript(s)HUMOSTRO_PEA_(—)1_PEA_(—)1_T30 (SEQ ID NO:143). An alignment is given tothe known protein (Osteopontin precursor (SEQ ID NO:1462)) at the end ofthe application. One or more alignments to one or more previouslypublished protein sequences are given at the end of the application. Abrief description of the relationship of the variant protein accordingto the present invention to each such aligned protein is as follows:

Comparison report between HUMOSTRO_PEA_(—)1_PEA_(—)1_P30 (SEQ IDNO:1629) and OSTP_HUMAN (SEQ ID NO:1462):

1. An isolated chimeric polypeptide encoding forHUMOSTRO_PEA_(—)1_PEA_(—)1_P30 (SEQ ID NO:1629), comprising a firstamino acid sequence being at least 90% homologous toMRIAVICFCLLGITCAIPVKQADSGSSEEKQ corresponding to amino acids 1-31 ofOSTP_HUMAN (SEQ ID NO:1462), which also corresponds to amino acids 1-31of HUMOSTRO_PEA_(—)1_PEA_(—)1_P30 (SEQ ID NO:1629), and a second aminoacid sequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequence VSIFYVFI

(SEQ ID NO: 262) corresponding to amino acids 32-39 ofHUMOSTRO_PEA_(—)1_PEA_(—)1_P30 (SEQ ID NO:1629), wherein said firstamino acid sequence and second amino acid sequence are contiguous and ina sequential order.

2. An isolated polypeptide encoding for a tail ofHUMOSTRO_PEA_(—)1_PEA_(—)1_P30 (SEQ ID NO:1629), comprising apolypeptide being at least 70%, optionally at least about 80%,preferably at least about 85%, more preferably at least about 90% andmost preferably at least about 95% homologous to the sequence VSIFYVFI(SEQ ID NO: 262) in HUMOSTRO_PEA_(—)1_PEA_(—)1_P30 (SEQ ID NO:1629).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein HUMOSTRO_PEA_(—)1_PEA_(—)1_P30 (SEQ ID NO:1629) also hasthe following non-silent SNPs (Single Nucleotide Polymorphisms) aslisted in Table 1205, (given according to their position(s) on the aminoacid sequence, with the alternative amino acid(s) listed; the lastcolumn indicates whether the SNP is known or not; the presence of knownSNPs in variant protein HUMOSTRO_PEA_(—)1_PEA_(—)1_P30 (SEQ ID NO:1629)sequence provides support for the deduced sequence of this variantprotein according to the present invention).

TABLE 1205 Amino acid mutations SNP position(s) on amino acidAlternative sequence amino acid(s) Previously known SNP? 7 C -> W No 31Q -> R No

The glycosylation sites of variant proteinHUMOSTRO_PEA_(—)1_PEA_(—)1_P30 (SEQ ID NO:1629), as compared to theknown protein Osteopontin precursor (SEQ ID NO:1462), are described inTable 1206 (given according to their position(s) on the amino acidsequence in the first column; the second column indicates whether theglycosylation site is present in the variant protein; and the lastcolumn indicates whether the position is different on the variantprotein).

TABLE 1206 Glycosylation site(s) Position(s) on known amino acidsequence Present in variant protein? 79 no 106 no

Variant protein HUMOSTRO_PEA_(—)1_PEA_(—)1_P30 (SEQ ID NO:1629) isencoded by the following transcript(s): HUMOSTRO_PEA_(—)1_PEA_(—)1_T30(SEQ ID NO:143), for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcriptHUMOSTRO_PEA_(—)1_PEA_(—)1_T30 (SEQ ID NO:143) is shown in bold; thiscoding portion starts at position 199 and ends at position 315. Thetranscript also has the following SNPs as listed in Table 1207 (givenaccording to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinHUMOSTRO_PEA_(—)1_PEA_(—)1_P30 (SEQ ID NO:1629) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 1207 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 136 A -> G Yes 154 T -> No159 G -> T Yes 219 C -> G No 274 -> G No 290 A -> G No

As noted above, cluster HUMOSTRO features 30 segment(s), which werelisted in Table 1194 above and for which the sequence(s) are given atthe end of the application. These segment(s) are portions of nucleicacid sequence(s) which are described herein separately because they areof particular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)0 (SEQ ID NO:961)according to the present invention is supported by 333 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141),HUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T30 (SEQ ID NO:143). Table 1208 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1208 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 1184 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 1 184 (SEQ ID NO: 142)HUMOSTRO_PEA_1_PEA_1_T30 1 184 (SEQ ID NO: 143)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)10 (SEQ ID NO:962)according to the present invention is supported by 4 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):HUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). Table 1209 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1209 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T16 292480 (SEQ ID NO: 142)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)16 (SEQ ID NO:963)according to the present invention is supported by 6 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141). Table 1210 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1210 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 373638 (SEQ ID NO: 141)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)23 (SEQ ID NO:964)according to the present invention is supported by 334 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). Table 1211 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1211 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 804967 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 727 890 (SEQ ID NO: 142)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)31 (SEQ ID NO:965)according to the present invention is supported by 350 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). Table 1212 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1212 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 11641393 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 1087 1316 (SEQ ID NO:142)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)43 (SEQ ID NO:966)according to the present invention is supported by 192 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). Table 1213 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1213 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 18101846 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 1733 1769 (SEQ ID NO:142)

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)3 (SEQ ID NO:967)according to the present invention is supported by 353 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141),HUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T30 (SEQ ID NO:143). Table 1214 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1214 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 185210 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 185 210 (SEQ ID NO: 142)HUMOSTRO_PEA_1_PEA_1_T30 185 210 (SEQ ID NO: 143)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)5 (SEQ ID NO:968)according to the present invention is supported by 353 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141),HUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T30 (SEQ ID NO:143). Table 1215 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1215 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 211252 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 211 252 (SEQ ID NO: 142)HUMOSTRO_PEA_1_PEA_1_T30 211 252 (SEQ ID NO: 143)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)7 (SEQ ID NO:969)according to the present invention is supported by 357 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141),HUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T30 (SEQ ID NO:143). Table 1216 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1216 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 253291 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 253 291 (SEQ ID NO: 142)HUMOSTRO_PEA_1_PEA_1_T30 253 291 (SEQ ID NO: 143)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)8 (SEQ ID NO:970)according to the present invention is supported by 1 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):HUMOSTRO_PEA_(—)1_PEA_(—)1_T30 (SEQ ID NO:143). Table 1217 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1217 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T30 292378 (SEQ ID NO: 143)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)15 (SEQ ID NO:971)according to the present invention is supported by 366 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). Table 1218 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1218 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 292372 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 481 561 (SEQ ID NO: 142)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)17 (SEQ ID NO:972)according to the present invention is supported by 261 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). Table 1219 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1219 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 639680 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 562 603 (SEQ ID NO: 142)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)20 (SEQ ID NO:973)according to the present invention can be found in the followingtranscript(s): HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). Table 1220 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1220 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 681688 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 604 611 (SEQ ID NO: 142)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)21 (SEQ ID NO:974)according to the present invention is supported by 315 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). Table 1221 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1221 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 689738 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 612 661 (SEQ ID NO: 142)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)22 (SEQ ID NO:975)according to the present invention is supported by 322 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). Table 1222 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1222 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 739803 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 662 726 (SEQ ID NO: 142)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)24 (SEQ ID NO:976)according to the present invention is supported by 270 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). Table 1223 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1223 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 9681004 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 891 927 (SEQ ID NO: 142)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)26 (SEQ ID NO:977)according to the present invention can be found in the followingtranscript(s): HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). Table 1224 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1224 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 10051022 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 928 945 (SEQ ID NO: 142)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)27 (SEQ ID NO:978)according to the present invention is supported by 260 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). Table 1225 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1225 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 10231048 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 946 971 (SEQ ID NO: 142)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)28 (SEQ ID NO:979)according to the present invention is supported by 273 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). Table 1226 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1226 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 10491100 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 972 1023 (SEQ ID NO: 142)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)29 (SEQ ID NO:980)according to the present invention is supported by 272 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). Table 1227 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1227 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 11011151 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 1024 1074 (SEQ ID NO:142)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)30 (SEQ ID NO:981)according to the present invention can be found in the followingtranscript(s): HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). Table 1228 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1228 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 11521163 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 1075 1086 (SEQ ID NO:142)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)32 (SEQ ID NO:982)according to the present invention is supported by 293 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). Table 1229 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1229 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 13941427 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 1317 1350 (SEQ ID NO:142)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)34 (SEQ ID NO:983)according to the present invention is supported by 301 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). Table 1230 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1230 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 14281468 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 1351 1391 (SEQ ID NO:142)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)36 (SEQ ID NO:984)according to the present invention is supported by 292 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). Table 1231 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1231 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 14691504 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 1392 1427 (SEQ ID NO:142)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)37 (SEQ ID NO:985)according to the present invention is supported by 295 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). Table 1232 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1232 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 15051623 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 1428 1546 (SEQ ID NO:142)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)38 (SEQ ID NO:986)according to the present invention can be found in the followingtranscript(s): HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). Table 1233 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1233 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 16241634 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 1547 1557 (SEQ ID NO:142)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)39 (SEQ ID NO:987)according to the present invention is supported by 268 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). Table 1234 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1234 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 16351725 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 1558 1648 (SEQ ID NO:142)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)40 (SEQ ID NO:988)according to the present invention can be found in the followingtranscript(s): HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). Table 1235 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1235 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 17261743 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 1649 1666 (SEQ ID NO:142)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)41 (SEQ ID NO:989)according to the present invention can be found in the followingtranscript(s): HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). Table 1236 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1236 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 17441749 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 1667 1672 (SEQ ID NO:142)

Segment cluster HUMOSTRO_PEA_(—)1_PEA_(—)1_node_(—)42 (SEQ ID NO:990)according to the present invention is supported by 224 libraries. Thenumber of libraries was determined as previously described. This segmentcan be found in the following transcript(s):HUMOSTRO_PEA_(—)1_PEA_(—)1_T14 (SEQ ID NO:141) andHUMOSTRO_PEA_(—)1_PEA_(—)1_T16 (SEQ ID NO:142). Table 1237 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1237 Segment location on transcripts Segment Segment endingTranscript name starting position position HUMOSTRO_PEA_1_PEA_1_T14 17501809 (SEQ ID NO: 141) HUMOSTRO_PEA_1_PEA_1_T16 1673 1732 (SEQ ID NO:142)Variant protein alignment to the previously known protein:

Description for Cluster R11723

Cluster R11723 features 6 transcript(s) and 26 segment(s) of interest,the names for which are given in Tables 1238 and 1239, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 1240.

TABLE 1238 Transcripts of interest Transcript Name Sequence ID No.R11723_PEA_1_T15 144 R11723_PEA_1_T17 145 R11723_PEA_1_T19 146R11723_PEA_1_T20 147 R11723_PEA_1_T5 148 R11723_PEA_1_T6 149

TABLE 1239 Segments of interest Segment Name Sequence ID No.R11723_PEA_1_node_13 991 R11723_PEA_1_node_16 992 R11723_PEA_1_node_19993 R11723_PEA_1_node_2 994 R11723_PEA_1_node_22 995R11723_PEA_1_node_31 996 R11723_PEA_1_node_10 997 R11723_PEA_1_node_11998 R11723_PEA_1_node_15 999 R11723_PEA_1_node_18 1000R11723_PEA_1_node_20 1001 R11723_PEA_1_node_21 1002 R11723_PEA_1_node_231003 R11723_PEA_1_node_24 1004 R11723_PEA_1_node_25 1005R11723_PEA_1_node_26 1006 R11723_PEA_1_node_27 1007 R11723_PEA_1_node_281008 R11723_PEA_1_node_29 1009 R11723_PEA_1_node_3 1010R11723_PEA_1_node_30 1011 R11723_PEA_1_node_4 1012 R11723_PEA_1_node_51013 R11723_PEA_1_node_6 1014 R11723_PEA_1_node_7 1015R11723_PEA_1_node_8 1016

TABLE 1240 Proteins of interest Protein Name Sequence ID No.R11723_PEA_1_P2 1409 R11723_PEA_1_P6 1410 R11723_PEA_1_P7 1411R11723_PEA_1_P13 1412 R11723_PEA_1_P10 1413

Cluster R11723 can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 47 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 47 and Table 1241. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions:epithelial malignant tumors, a mixture of malignant tumors fromdifferent tissues and kidney malignant tumors.

TABLE 1241 Normal tissue distribution Name of Tissue Number Adrenal 0Brain 30 Epithelial 3 General 17 head and neck 0 Kidney 0 Lung 0 Breast0 Ovary 0 Pancreas 10 Skin 0 Uterus 0

TABLE 1242 P values and ratios for expression in cancerous tissue Nameof Tissue P1 P2 SP1 R3 SP2 R4 Adrenal 4.2e−01 4.6e−01 4.6e−01 2.25.3e−01 1.9 Brain 2.2e−01 2.0e−01 1.2e−02 2.8 5.0e−02 2.0 Epithelial3.0e−05 6.3e−05 1.8e−05 6.3 3.4e−06 6.4 General 7.2e−03 4.0e−02 1.3e−042.1 1.1e−03 1.7 head and neck 1 5.0e−01 1 1.0 7.5e−01 1.3 Kidney 1.5e−012.4e−01 4.4e−03 5.4 2.8e−02 3.6 Lung 1.2e−01 1.6e−01 1 1.6 1 1.3 Breast5.9e−01 4.4e−01 1 1.1 6.8e−01 1.5 Ovary 1.6e−02 1.3e−02 1.0e−01 3.87.0e−02 3.5 Pancreas 5.5e−01 2.0e−01 3.9e−01 1.9 1.4e−01 2.7 Skin 14.4e−01 1 1.0 1.9e−02 2.1 Uterus 1.5e−02 5.4e−02 1.9e−01 3.1 1.4e−01 2.5

As noted above, contig R11723 features 6 transcript(s), which werelisted in Table 1238 above. A description of each variant proteinaccording to the present invention is now provided.

Variant protein R11723_PEA_(—)1_P2 (SEQ ID NO:1409) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) R11723_PEA_(—)1_T6 (SEQ IDNO:149). The location of the variant protein was determined according toresults from a number of different software programs and analyses,including analyses from SignalP and other specialized programs. Thevariant protein is believed to be located as follows with regard to thecell: secreted. The protein localization is believed to be secretedbecause both signal-peptide prediction programs predict that thisprotein has a signal peptide, and neither trans-membrane regionprediction program predicts that this protein has a trans-membraneregion.

Variant protein R11723_PEA_(—)1_(—)1_P2 (SEQ ID NO:1409) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 1243, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein R11723_PEA_(—)1_P2 (SEQ ID NO:1409) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 1243 Amino acid mutations SNP position(s) on AlternativePreviously amino acid sequence amino acid(s) known SNP? 107 H -> P Yes70 G -> No 70 G -> C No

Variant protein R11723_PEA_(—)1_P2 (SEQ ID NO:1409) is encoded by thefollowing transcript(s): R11723_PEA_(—)1_T6 (SEQ ID NO:149), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript R11723_PEA_(—)1_T6 (SEQ ID NO:149) is shown inbold; this coding portion starts at position 1716 and ends at position2051. The transcript also has the following SNPs as listed in Table 1244(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last colunm indicates whether theSNP is known or not; the presence of known SNPs in variant proteinR11723_PEA_(—)1_P2 (SEQ ID NO:1409) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 1244 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 1231 C -> T Yes 1278 G -> CYes 1923 G -> No 1923 G -> T No 2035 A -> C Yes 2048 A -> C No 2057 A ->G Yes

Variant protein R11723_PEA_(—)1_P6 (SEQ ID NO:1410) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) R11723_PEA_(—)1_T15 (SEQ IDNO:144). One or more alignments to one or more previously publishedprotein sequences are given at the end of the application. A briefdescription of the relationship of the variant protein according to thepresent invention to each such aligned protein is as follows:

Comparison report between R11723_PEA_(—)1_P6 (SEQ ID NO:1410) and Q8IXM0(SEQ ID NO:1707):

1. An isolated chimeric polypeptide encoding for R11723_PEA_(—)1_P6 (SEQID NO:1410), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequenceMWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAGIMYRKSCASSAACLIASAGSPCRGLAPGREEQRALHKAGAVGGGVR (SEQ ID NO: 1741) correspondingto amino acids 1-110 of R11723_PEA_(—)1_P6 (SEQ ID NO:1410), and asecond amino acid sequence being at least 90% homologous toMYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRERQRKEKHSMRTQ corresponding to aminoacids 1-112 of Q8IXM0 (SEQ ID NO:1707), which also corresponds to aminoacids 111-222 of R11723_PEA_(—)1_P6 (SEQ ID NO:1410), wherein said firstand second amino acid sequences are contiguous and in a sequentialorder.

2. An isolated polypeptide encoding for a head of R11723_PEA_(—)1_P6(SEQ ID NO:1410), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequenceMWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAGIMYRKSCASSAACLIASAGSPCRGLAPGREEQRALHKAGAVGGGVR (SEQ ID NO: 1741) ofR11723_PEA_(—)1_P6 (SEQ ID NO:1410).

Comparison report between R11723_PEA_(—)1_P6 (SEQ ID NO:1410) and Q96AC2(SEQ ID NO: 1708):

1. An isolated chimeric polypeptide encoding for R11723_PEA_(—)1_P6 (SEQID NO:1410), comprising a first amino acid sequence being at least 90%homologous toMWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAGIMYRKSCASSAACLIASAG corresponding to amino acids 1-83 of Q96AC2 (SEQ IDNO:1708), which also corresponds to amino acids 1-83 ofR11723_PEA_(—)1_P6 (SEQ ID NO:1410), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequenceSPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRERQRKEKHSMRTQ (SEQ ID NO: 1742) corresponding to amino acids 84-222 ofR11723_PEA_(—)1_P6 (SEQ ID NO:1410), wherein said first and second aminoacid sequences are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of R11723_PEA_(—)1_P6(SEQ ID NO:1410), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequenceSPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRERQRKEKHSMRTQ (SEQ ID NO: 1742) in R11723_PEA_(—)1_P6 (SEQ ID NO:1410).

Comparison report between R11723_PEA_(—)1_P6 (SEQ ID NO:1410) and Q8N2G4(SEQ ID NO: 1709):

1. An isolated chimeric polypeptide encoding for R11723_PEA_(—)1_P6 (SEQID NO:1410), comprising a first amino acid sequence being at least 90%homologous toMWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAGIMYRKSCASSAACLIASAG corresponding to amino acids 1-83 of Q8N2G4 (SEQ IDNO:1709), which also corresponds to amino acids 1-83 ofR11723_PEA_(—)1_P6 (SEQ ID NO:1410), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequenceSPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRERQRKEKHSMRTQ (SEQ ID NO: 1742) corresponding to amino acids 84-222 ofR11723_PEA_(—)1_P6 (SEQ ID NO:1410), wherein said first and second aminoacid sequences are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of R11723_PEA_(—)1_P6(SEQ ID NO:1410), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequenceSPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRERQRKEKHSMRTQ (SEQ ID NO: 1742) in R11723_PEA_(—)1_P6 (SEQ ID NO:1410).

Comparison report between R11723_PEA_(—)1_P6 (SEQ ID NO:1410) andBAC85518 (SEQ ID NO: 1710):

1. An isolated chimeric polypeptide encoding for R11723_PEA_(—)1_P6 (SEQID NO:1410), comprising a first amino acid sequence being at least 90%homologous toMWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAGIMYRKSCASSAACLIASAG corresponding to amino acids 24-106 of BAC85518 (SEQ IDNO:1710), which also corresponds to amino acids 1-83 ofR11723_PEA_(—)1_P6 (SEQ ID NO:1410), and a second amino acid sequencebeing at least 70%, optionally at least 80%, preferably at least 85%,more preferably at least 90% and most preferably at least 95% homologousto a polypeptide having the sequenceSPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRERQRKEKHSMRTQ (SEQ ID NO: 1742) corresponding to amino acids 84-222 ofR11723_PEA_(—)1_P6 (SEQ ID NO:1410), wherein said first and second aminoacid sequences are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of R11723_PEA_(—)1_P6(SEQ ID NO:1410), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequenceSPCRGLAPGREEQRALHKAGAVGGGVRMYAQALLVVGVLQRQAAAQHLHEHPPKLLRGHRVQERVDDRAEVEKRLREGEEDHVRPEVGPRPVVLGFGRSHDPPNLVGHPAYGQCHNNQPWADTSRRERQRKEHSM RTQ(SEQ ID NO: 1742) in R11723_PEA_(—)1_P6 (SEQ ID NO:1410).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein R11723_PEA_(—)1_P6 (SEQ ID NO:1410) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 1245, (given according to their 1271 position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein R11723_PEA_(—)1_P6 (SEQ ID NO:1410) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 1245 Amino acid mutations SNP position(s) on AlternativePreviously amino acid sequence amino acid(s) known SNP? 180 G -> No 180G -> C No 217 H -> P Yes

Variant protein R11723_PEA_(—)1_P6 (SEQ ID NO:1410) is encoded by thefollowing transcript(s): R11723_PEA_(—)1_T15 (SEQ ID NO:144), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript R11723_PEA_(—)1_T15 (SEQ ID NO:144) is shown inbold; this coding portion starts at position 434 and ends at position1099. The transcript also has the following SNPs as listed in Table 1246(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinR11723_PEA_(—)1_P6 (SEQ ID NO:1410) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 1246 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 971 G -> No 971 G -> T No1083 A -> C Yes 1096 A -> C No 1105 A -> G Yes

Variant protein R11723_PEA_(—)1_P7 (SEQ ID NO:1411) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) R11723_PEA_(—)1_T17 (SEQ IDNO:145). One or more alignments to one or more previously publishedprotein sequences are given at the end of the application. A briefdescription of the relationship of the variant protein according to thepresent invention to each such aligned protein is as follows:

Comparison report between R11723_PEA_(—)1_P7 (SEQ ID NO:1411) and Q96AC2(SEQ ID NO:1708):

1. An isolated chimeric polypeptide encoding for R11723_PEA_(—)1_P7 (SEQID NO:1411), comprising a first amino acid sequence being at least 90%homologous toMWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAGcorresponding to amino acids 1-64 of Q96AC2 (SEQ ID NO:1708), which alsocorresponds to amino acids 1-64 of R11723_PEA_(—)1_P7 (SEQ ID NO:1411),and a second amino acid sequence being at least 70%, optionally at least80%, preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceSHCVTRLECSGTISAHCNLCLPGSNDHPT (SEQ ID NO: 1743) corresponding to aminoacids 65-93 of R11723_PEA_(—)1_P7 (SEQ ID NO:1411), wherein said firstand second amino acid sequences are contiguous and in a sequentialorder.

2. An isolated polypeptide encoding for a tail of R11723_PEA_(—)1_P7(SEQ ID NO:1411), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT (SEQ ID NO:1743) in R11723_PEA_(—)1_P7 (SEQ ID NO:1411).

Comparison report between R11723_PEA_(—)1_P7 (SEQ ID NO:1411) and Q8N2G4(SEQ ID NO:1709):

1. An isolated chimeric polypeptide encoding for R11723_PEA_(—)1_P7 (SEQID NO:1411), comprising a first amino acid sequence being at least 90%homologous toMWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAGcorresponding to amino acids 1-64 of Q8N2G4 (SEQ ID NO:1709), which alsocorresponds to amino acids 1-64 of R11723_PEA_(—)1_P7 (SEQ ID NO:1411),and a second amino acid sequence being at least 70%, optionally at least80%, preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceSHCVTRLECSGTISAHCNLCLPGSNDHPT (SEQ ID NO: 1743) corresponding to aminoacids 65-93 of R11723_PEA_(—)1_P7 (SEQ ID NO:1411), wherein said firstand second amino acid sequences are contiguous and in a sequentialorder.

2. An isolated polypeptide encoding for a tail of R11723_PEA_(—)1_P7(SEQ ID NO:1411), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT (SEQ ID NO:1743) in R11723_PEA_(—)1_P7 (SEQ ID NO:1411).

Comparison report between R11723_PEA_(—)1_P7 (SEQ ID NO:1411) andBAC85273:

1. An isolated chimeric polypeptide encoding for R11723_PEA_(—)1_P7 (SEQID NO:1411), comprising a first amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence MWVLG (SEQ ID NO: 1744) corresponding to amino acids1-5 of R11723_PEA_(—)1_P7 (SEQ ID NO:1411), second amino acid sequencebeing at least 90% homologous toIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAGcorresponding to amino acids 22-80 of BAC85273, which also correspondsto amino acids 6-64 of R11723_PEA_(—)1_P7 (SEQ ID NO:1411), and a thirdamino acid sequence being at least 70%, optionally at least 80%,preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceSHCVTRLECSGTISAHCNLCLPGSNDHPT (SEQ ID NO: 1743) corresponding to aminoacids 65-93 of R11723_PEA_P7 (SEQ ID NO:1411), wherein said first,second and third amino acid sequences are contiguous and in a sequentialorder.

2. An isolated polypeptide encoding for a head of R11723_PEA_(—)1_P7(SEQ ID NO:1411), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence MWVLG (SEQ ID NO: 1744) of R11723_PEA_(—)1_P7(SEQ ID NO:1411).

3. An isolated polypeptide encoding for a tail of R11723_PEA_(—)1_P7(SEQ ID NO:1411), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT (SEQ ID NO:1743) in R11723_PEA_(—)1_P7 (SEQ ID NO:1411).

Comparison report between R11723_PEA_(—)1_P7 (SEQ ID NO:1411) andBAC85518 (SEQ ID NO:1710):

1. An isolated chimeric polypeptide encoding for R11723_PEA_(—)1_P7 (SEQID NO:1411), comprising a first amino acid sequence being at least 90%homologous toMWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAGcorresponding to amino acids 24-87 of BAC85518 (SEQ ID NO:1710), whichalso corresponds to amino acids 1-64 of R11723_PEA_(—)1_P7 (SEQ IDNO:1411), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT (SEQ ID NO: 1743)corresponding to amino acids 65-93 of R11723_PEA_(—)1_P7 (SEQ IDNO:1411), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of R11723_PEA_(—)1_P7(SEQ ID NO:1411), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence SHCVTRLECSGTISAHCNLCLPGSNDHPT (SEQ ID NO:1743) in R11723_PEA_(—)1_P7 (SEQ ID NO:1411).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein R11723_PEA_(—)1_P7 (SEQ ID NO:1411) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 1247, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein R11723_PEA_(—)1_P7 (SEQ ID NO:1411) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 1247 Amino acid mutations SNP position(s) on AlternativePreviously amino acid sequence amino acid(s) known SNP? 67 C -> S Yes

Variant protein R11723_PEA_(—)1_P7 (SEQ ID NO:1411) is encoded by thefollowing transcript(s): R11723_PEA_(—)1_T17 (SEQ ID NO:145), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript R11723_PEA_(—)1_T17 (SEQ ID NO:145) is shown inbold; this coding portion starts at position 434 and ends at position712. The transcript also has the following SNPs as listed in Table 1248(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinR11723_PEA_(—)1_P7 (SEQ ID NO:1411) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 1248 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 625 G -> T Yes 633 G -> CYes 1303 C -> T Yes

Variant protein R11723_PEA_(—)1_P13 (SEQ ID NO:1412) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) R11723_PEA_(—)1_T19 (SEQ IDNO:146). One or more alignments to one or more previously publishedprotein sequences are given at the end of the application. A briefdescription of the relationship of the variant protein according to thepresent invention to each such aligned protein is as follows:

Comparison report between R11723_PEA_(—)1_P13 (SEQ ID NO:1412) andQ96AC2 (SEQ ID NO:1708):

1. An isolated chimeric polypeptide encoding for R11723_PEA_(—)1_P13(SEQ ID NO:1412), comprising a first amino acid sequence being at least90% homologous toMWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAcorresponding to amino acids 1-63 of Q96AC2 (SEQ ID NO:1708), which alsocorresponds to amino acids 1-63 of R11723_PEA_(—)1_P13 (SEQ ID NO:1412),and a second amino acid sequence being at least 70%, optionally at least80%, preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceDTKRTNTLLFEMRHFAKQLTT (SEQ ID NO: 1745) corresponding to amino acids64-84 of R11723_PEA_(—)1_P13 (SEQ ID NO:1412), wherein said first andsecond amino acid sequences are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of R11723_PEA_(—)1_P13(SEQ ID NO:1412), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence DTKRTNTLLFEMRHFAKQLTT (SEQ ID NO: 1745) inR11723_PEA_(—)1_P13 (SEQ ID NO:1412).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein R11723_PEA_(—)1_P13 (SEQ ID NO:1412) is encoded by thefollowing transcript(s): R11723_PEA_(—)1_T19 (SEQ ID NO:146) andR11723_PEA_(—)1_T5 (SEQ ID NO:148), for which the sequence(s) is/aregiven at the end of the application. The coding portion of transcriptR11723_PEA_LT19 (SEQ ID NO:146) is shown in bold; this coding portionstarts at position 434 and ends at position 685. The transcript also hasthe following SNPs as listed in Table 1249 (given according to theirposition on the nucleotide sequence, with the alternative nucleic acidlisted; the last column indicates whether the SNP is known or not; thepresence of known SNPs in variant protein R11723_PEA_(—)1_P13 (SEQ IDNO:1412) sequence provides support for the deducted sequence of thisvariant protein according to the present invention).

TABLE 1249 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 778 G -> T Yes 786 G -> CYes 1456 C -> T Yes

Variant protein R11723_PEA_(—)1_P10 (SEQ ID NO:1413) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) R11723_PEA_(—)1_T20 (SEQ IDNO:147). One or more alignments to one or more previously publishedprotein sequences are given at the end of the application. A briefdescription of the relationship of the variant protein according to thepresent invention to each such aligned protein is as follows:

Comparison report between R11723_PEA_(—)1_P10 (SEQ ID NO:1413) andQ96AC2 (SEQ ID NO:1708):

1. An isolated chimeric polypeptide encoding for R11723_PEA_(—)1_P10(SEQ ID NO:1413), comprising a first amino acid sequence being at least90% homologous toMWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAcorresponding to amino acids 1-63 of Q96AC2 (SEQ ID NO:1708), which alsocorresponds to amino acids 1-63 of R11723_PEA_(—)1_P10 (SEQ ID NO:1413),and a second amino acid sequence being at least 70%, optionally at least80%, preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceDRVSLCHEAGVQWNNFSTLQPLPPRLK (SEQ ID NO: 1746) corresponding to aminoacids 64-90 of R11723_PEA_(—)1_P10 (SEQ ID NO:1413), wherein said firstand second amino acid sequences are contiguous and in a sequentialorder.

2. An isolated polypeptide encoding for a tail of R11723_PEA_(—)1_P10(SEQ ID NO:1413), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK (SEQ ID NO: 1746)in R11723_PEA_(—)1_P10 (SEQ ID NO:1413).

Comparison report between R11723_PEA_(—)1_P10 (SEQ ID NO:1413) andQ8N2G4 (SEQ ID NO:1709):

1. An isolated chimeric polypeptide encoding for R11723_PEA_(—)1_P10(SEQ ID NO:1413), comprising a first amino acid sequence being at least90% homologous toMWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAcorresponding to amino acids 1-63 of Q8N2G4 (SEQ ID NO:1709), which alsocorresponds to amino acids 1-63 of R11723_PEA_(—)1_P10 (SEQ ID NO:1413),and a second amino acid sequence being at least 70%, optionally at least80%, preferably at least 85%, more preferably at least 90% and mostpreferably at least 95% homologous to a polypeptide having the sequenceDRVSLCHEAGVQWNNFSTLQPLPPRLK (SEQ ID NO: 1746) corresponding to aminoacids 64-90 of R11723_PEA_(—)1_P10 (SEQ ID NO:1413), wherein said firstand second amino acid sequences are contiguous and in a sequentialorder.

2. An isolated polypeptide encoding for a tail of R11723_PEA_(—)1_P10(SEQ ID NO:1413), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK (SEQ ID NO: 1746)in R11723_PEA_(—)1_P10 (SEQ ID NO:1413).

Comparison report between R11723_PEA_(—)1_P10 (SEQ ID NO:1413) andBAC85273:

1. An isolated chimeric polypeptide encoding for R11723_PEA_(—)1_P10(SEQ ID NO:1413), comprising a first amino acid sequence being at least70%, optionally at least 80%, preferably at least 85%, more preferablyat least 90% and most preferably at least 95% homologous to apolypeptide having the sequence MWVLG (SEQ ID NO: 1744) corresponding toamino acids 1-5 of R11723_PEA_(—)1_P10 (SEQ ID NO:1413), second aminoacid sequence being at least 90% homologous toIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSA correspondingto amino acids 22-79 of BAC85273, which also corresponds to amino acids6-63 of R11723_PEA_(—)1_P10 (SEQ ID NO:1413), and a third amino acidsequence being at least 70%, optionally at least 80%, preferably atleast 85%, more preferably at least 90% and most preferably at least 95%homologous to a polypeptide having the sequenceDRVSLCHEAGVQWNNFSTLQPLPPRLK (SEQ ID NO: 1746) corresponding to aminoacids 64-90 of R11723_PEA_(—)1_P10 (SEQ ID NO:1413), wherein said first,second and third amino acid sequences are contiguous and in a sequentialorder.

2. An isolated polypeptide encoding for a head of R11723_PEA_(—)1_P10(SEQ ID NO:1413), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence MWVLG (SEQ ID NO: 1744) ofR11723_PEA_(—)1_P10 (SEQ ID NO:1413).

3. An isolated polypeptide encoding for a tail of R11723_PEA_(—)1_P10(SEQ ID NO:1413), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK (SEQ ID NO: 1746)in R11723_PEA_(—)1_P10 (SEQ ID NO:1413).

Comparison report between R11723_PEA_(—)1_P10 (SEQ ID NO:1413) andBAC85518 (SEQ ID NO:1710):

1. An isolated chimeric polypeptide encoding for R11723_PEA_(—)1_P10(SEQ ID NO:1413), comprising a first amino acid sequence being at least90% homologous toMWVLGIAATFCGLFLLPGFALQIQCYQCEEFQLNNDCSSPEFIVNCTVNVQDMCQKEVMEQSAcorresponding to amino acids 24-86 of BAC85518 (SEQ ID NO:1710), whichalso corresponds to amino acids 1-63 of R11723_PEA_(—)1_P10 (SEQ IDNO:1413), and a second amino acid sequence being at least 70%,optionally at least 80%, preferably at least 85%, more preferably atleast 90% and most preferably at least 95% homologous to a polypeptidehaving the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK (SEQ ID NO: 1746)corresponding to amino acids 64-90 of R11723_PEA_(—)1_P10 (SEQ IDNO:1413), wherein said first and second amino acid sequences arecontiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of R11723_PEA_(—)1_P10(SEQ ID NO:1413), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence DRVSLCHEAGVQWNNFSTLQPLPPRLK (SEQ ID NO: 1746)in R11723PEA_(—)1_P10 (SEQ ID NO:1413).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein R11723_PEA_(—)1_P10 (SEQ ID NO:1413) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 1250, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein R11723_PEA_(—)1_P10 (SEQ ID NO:1413) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 1250 Amino acid mutations SNP position(s) on AlternativePreviously amino acid sequence amino acid(s) known SNP? 66 V -> F Yes

Variant protein R11723_PEA_(—)1_P10 (SEQ ID NO:1413) is encoded by thefollowing transcript(s): R11723_PEA_(—)1_T20 (SEQ ID NO:147), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript R11723_PEA_(—)1_T20 (SEQ ID NO:147) is shown inbold; this coding portion starts at position 434 and ends at position703. The transcript also has the following SNPs as listed in Table 1251(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinR11723_PEA_(—)1_P10 (SEQ ID NO:1413) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 1251 Nucleic acid SNPs SNP position on Alternative Previouslynucleotide sequence nucleic acid known SNP? 629 G -> T Yes 637 G -> CYes 1307 C -> T Yes

As noted above, cluster R11723 features 26 segment(s), which were listedin Table 1239 above and for which the sequence(s) are given at the endof the application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster R11723_PEA_(—)1_node_(—)13 (SEQ ID NO:991) according tothe present invention is supported by 5 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R11723_PEA_(—)1_T19 (SEQ IDNO:146), R11723_PEA_(—)1_T5 (SEQ ID NO:148) and R11723_PEA_(—)1_T6 (SEQID NO:149). Table 1252 below describes the starting and ending positionof this segment on each transcript.

TABLE 1252 Segment location on transcripts Segment Segment startingending Transcript name position position R11723_PEA_1_T19 (SEQ ID NO:146) 624 776 R11723_PEA_1_T5 (SEQ ID NO: 148) 624 776 R11723_PEA_1_T6(SEQ ID NO: 149) 658 810

Segment cluster R11723_PEA_(—)1_node_(—)16 (SEQ ID NO:992) according tothe present invention is supported by 3 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R11723_PEA_(—)1_T17 (SEQ IDNO:145), R11723_PEA_(—)1_T19 (SEQ ID NO:146) and R11723_PEA_(—)1_T20(SEQ ID NO:147). Table 1253 below describes the starting and endingposition of this segment on each transcript.

TABLE 1253 Segment location on transcripts Segment Segment startingending Transcript name position position R11723_PEA_1_T17 (SEQ ID NO:145) 624 1367 R11723_PEA_1_T19 (SEQ ID NO: 146) 777 1520R11723_PEA_1_T20 (SEQ ID NO: 147) 628 1371

Segment cluster R11723_PEA_(—)1_node_(—)19 (SEQ ID NO:993) according tothe present invention is supported by 45 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R11723_PEA_(—)1_T5 (SEQ ID NO:148)and R11723_PEA_(—)1_T6 (SEQ ID NO:149). Table 1254 below describes thestarting and ending position of this segment on each transcript.

TABLE 1254 Segment location on transcripts Segment starting Segmentending Transcript name position position R11723_PEA_1_T5 (SEQ ID NO:148) 835 1008 R11723_PEA_1_T6 (SEQ ID NO: 149) 869 1042

Segment cluster R11723_PEA_(—)1_node_(—)2 (SEQ ID NO:994) according tothe present invention is supported by 29 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R11723_PEA_(—)1_T15 (SEQ IDNO:144), R11723_PEA_(—)1_T17 (SEQ ID NO:145), R11723_PEA_(—)1_T19 (SEQID NO:146), R11723_PEA_(—)1_T20 (SEQ ID ON:147), R11723_PEA_(—)1_T5 (SEQID NO:148) and R11723_PEA_(—)1_T6 (SEQ ID NO:149). Table 1255 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1255 Segment location on transcripts Segment starting SegmentTranscript name position ending position R11723_PEA_1_T15 (SEQ ID NO:144) 1 309 R11723_PEA_1_T17 (SEQ ID NO: 145) 1 309 R11723_PEA_1_T19 (SEQID NO: 146) 1 309 R11723_PEA_1_T20 (SEQ ID NO: 147) 1 309R11723_PEA_1_T5 (SEQ ID NO: 148) 1 309 R11723_PEA_1_T6 (SEQ ID NO: 149)1 309

Segment cluster R11723_PEA_(—)1_node_(—)22 (SEQ ID NO:995) according tothe present invention is supported by 65 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R11723_PEA_(—)1_T5 (SEQ ID NO:148)and R11723_PEA_(—)1_T6 (SEQ ID NO:149). Table 1256 below describes thestarting and ending position of this segment on each transcript.

TABLE 1256 Segment location on transcripts Segment Segment endingTranscript name starting position position R11723_PEA_1_T5 (SEQ ID NO:148) 1083 1569 R11723_PEA_1_T6 (SEQ ID NO: 149) 1117 1603

Segment cluster R11723_PEA_(—)1_node_(—)31 (SEQ ID NO:996) according tothe present invention is supported by 70 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R11723_PEA_(—)1_T15 (SEQ IDNO:144), R11723_PEA_(—)1_T5 (SEQ ID NO:148) and R11723_PEA_(—)1_T6 (SEQID NO:149). Table 1257 below describes the starting and ending positionof this segment on each transcript (it should be noted that thesetranscripts show alternative polyadenylation).

TABLE 1257 Segment location on transcripts Segment starting SegmentTranscript name position ending position R11723_PEA_1_T15 (SEQ ID NO:144) 1060 1295 R11723_PEA_1_T5 (SEQ ID NO: 148) 1978 2213R11723_PEA_1_T6 (SEQ ID NO: 149) 2012 2247

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster R11723_PEA_(—)1_node_(—)10 (SEQ ID NO:997) according tothe present invention is supported by 38 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R11723_PEA_(—)1_T15 (SEQ IDNO:144), R11723_PEA_(—)1_T17 (SEQ ID NO:145), R11723_PEA_(—)1_T19 (SEQID NO:146), R11723_PEA_(—)1_T20 (SEQ ID NO:147), R11723_PEA_(—)1_T5 (SEQID NO:148) and R11723_PEA_(—)1_T6 (SEQ ID NO:149). Table 1258 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1258 Segment location on transcripts Segment starting SegmentTranscript name position ending position R11723_PEA_1_T15 (SEQ ID NO:144) 486 529 R11723_PEA_1_T17 (SEQ ID NO: 145) 486 529 R11723_PEA_1_T19(SEQ ID NO: 146) 486 529 R11723_PEA_1_T20 (SEQ ID NO: 147) 486 529R11723_PEA_1_T5 (SEQ ID NO: 148) 486 529 R11723_PEA_1_T6 (SEQ ID NO:149) 520 563

Segment cluster R11723_PEA_(—)1_node_(—)11 (SEQ ID NO:998) according tothe present invention is supported by 42 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R11723_PEA_(—)1_T15 (SEQ IDNO:144), R11723_PEA_(—)1_T17 (SEQ ID NO:145), R11723_PEA_(—)1_T19 (SEQID NO:146), R11723_PEA_(—)1_T20 (SEQ ID NO:147), R11723_PEA_(—)1_T5 (SEQID NO:148) and R11723_PEA_(—)1_T6 (SEQ ID NO:149). Table 1259 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1259 Segment location on transcripts Segment starting SegmentTranscript name position ending position R11723_PEA_1_T15 (SEQ ID NO:144) 530 623 R11723_PEA_1_T17 (SEQ ID NO: 145) 530 623 R11723_PEA_1_T19(SEQ ID NO: 146) 530 623 R11723_PEA_1_T20 (SEQ ID NO: 147) 530 623R11723_PEA_1_T5 (SEQ ID NO: 148) 530 623 R11723_PEA_1_T6 (SEQ ID NO:149) 564 657

Segment cluster R11723_PEA_(—)1_node_(—)15 (SEQ ID NO:999) according tothe present invention can be found in the following transcript(s):R11723_PEA_(—)1_T20 (SEQ ID NO:147). Table 1260 below describes thestarting and ending position of this segment on each transcript.

TABLE 1260 Segment location on transcripts Segment starting SegmentTranscript name position ending position R11723_PEA_1_T20 (SEQ ID NO:147) 624 627

Segment cluster R11723_PEA_(—)1_node_(—)18 (SEQ ID NO:1000) according tothe present invention is supported by 40 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R11723_PEA_(—)1_T15 (SEQ IDNO:144), R11723_PEA_(—)1_T5 (SEQ ID NO:148) and R11723_PEA_(—)1_T6 (SEQID NO:149). Table 1261 below describes the starting and ending positionof this segment on each transcript.

TABLE 1261 Segment location on transcripts Segment starting SegmentTranscript name position ending position R11723_PEA_1_T15 (SEQ ID NO:144) 624 681 R11723_PEA_1_T5 (SEQ ID NO: 148) 777 834 R11723_PEA_1_T6(SEQ ID NO: 149) 811 868

Segment cluster R11723_PEA_(—)1_node_(—)20 (SEQ ID NO:1001) according tothe present invention can be found in the following transcript(s):R11723_PEA_(—)1_T5 (SEQ ID NO:148) and R11723_PEA_(—)1_T6 (SEQ IDNO:149). Table 1262 below describes the starting and ending position ofthis segment on each transcript.

TABLE 1262 Segment location on transcripts Segment Segment endingTranscript name starting position position R11723_PEA_1_T5 (SEQ ID NO:148) 1009 1019 R11723_PEA_1_T6 (SEQ ID NO: 149) 1043 1053

Segment cluster R11723_PEA_(—)1_node_(—)21 (SEQ ID NO: 1002) accordingto the present invention is supported by 36 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R11723_PEA_(—)1_T5 (SEQ ID NO:148)and R11723_PEA_(—)1_T6 (SEQ ID NO:149). Table 1263 below describes thestarting and ending position of this segment on each transcript.

TABLE 1263 Segment location on transcripts Segment Segment endingTranscript name starting position position R11723_PEA_1_T5 (SEQ ID NO:148) 1020 1082 R11723_PEA_1_T6 (SEQ ID NO: 149) 1054 1116

Segment cluster R11723_PEA_(—)1_node_(—)23 (SEQ ID NO:1003) according tothe present invention is supported by 39 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R11723_PEA_(—)1_T5 (SEQ ID NO:148)and R11723_PEA_(—)1_T6 (SEQ ID NO:149). Table 1264 below describes thestarting and ending position of this segment on each transcript.

TABLE 1264 Segment location on transcripts Segment Segment endingTranscript name starting position position R11723_PEA_1_T5 (SEQ ID NO:148) 1570 1599 R11723_PEA_1_T6 (SEQ ID NO: 149) 1604 1633

Segment cluster R11723_PEA_(—)1_node_(—)24 (SEQ ID NO:1004) according tothe present invention is supported by 51 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R11723_PEA_(—)1_T15 (SEQ IDNO:144), R11723_PEA_(—)1_T5 (SEQ ID NO:148) and R11723_PEA_(—)1_T6 (SEQID NO:149). Table 1265 below describes the starting and ending positionof this segment on each transcript.

TABLE 1265 Segment location on transcripts Segment starting SegmentTranscript name position ending position R11723_PEA_1_T15 (SEQ ID NO:144) 682 765 R11723_PEA_1_T5 (SEQ ID NO: 148) 1600 1683 R11723_PEA_1_T6(SEQ ID NO: 149) 1634 1717

Segment cluster R11723_PEA_(—)1_node_(—)25 (SEQ ID NO:1005) according tothe present invention is supported by 54 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R11723_PEA_(—)1_T15 (SEQ IDNO:144), R11723_PEA_(—)1_T5 (SEQ ID NO:148) and R11723_PEA_(—)1_T6 (SEQID NO:149). Table 1266 below describes the starting and ending positionof this segment on each transcript.

TABLE 1266 Segment location on transcripts Segment starting SegmentTranscript name position ending position R11723_PEA_1_T15 (SEQ ID NO:144) 766 791 R11723_PEA_1_T5 (SEQ ID NO: 148) 1684 1709 R11723_PEA_1_T6(SEQ ID NO: 149) 1718 1743

Segment cluster R11723_PEA_(—)1_node_(—)26 (SEQ ID NO:1006) according tothe present invention is supported by 62 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R11723_PEA_(—)1_T15 (SEQ IDNO:144), R11723_PEA_(—)1_T5 (SEQ ID NO:148) and R11723_PEA_(—)1_T6 (SEQID NO:149). Table 1267 below describes the starting and ending positionof this segment on each transcript.

TABLE 1267 Segment location on transcripts Segment starting SegmentTranscript name position ending position R11723_PEA_1_T15 (SEQ ID NO:144) 792 904 R11723_PEA_1_T5 (SEQ ID NO: 148) 1710 1822 R11723_PEA_1_T6(SEQ ID NO: 149) 1744 1856

Segment cluster R11723_PEA_(—)1_node_(—)27 (SEQ ID NO:1007) according tothe present invention is supported by 67 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R11723_PEA_(—)1_T15 (SEQ IDNO:144), R11723_PEA_(—)1_T5 (SEQ ID NO:148) and R11723_PEA_(—)1_T6 (SEQID NO:149). Table 1268 below describes the starting and ending positionof this segment on each transcript.

TABLE 1268 Segment location on transcripts Segment starting SegmentTranscript name position ending position R11723_PEA_1_T15 (SEQ ID NO:144) 905 986 R11723_PEA_1_T5 (SEQ ID NO: 148) 1823 1904 R11723_PEA_1_T6(SEQ ID NO: 149) 1857 1938

Segment cluster R11723_PEA_(—)1_node_(—)28 (SEQ ID NO:1008) according tothe present invention can be found in the following transcript(s):R11723_PEA_(—)1_T15 (SEQ ID NO:144), R11723_PEA_(—)1_T5 (SEQ ID NO:148)and R11723_PEA_(—)1_T6 (SEQ ID NO:149). Table 1269 below describes thestarting and ending position of this segment on each transcript.

TABLE 1269 Segment location on transcripts Segment starting SegmentTranscript name position ending position R11723_PEA_1_T15 (SEQ ID NO:144) 987 1010 R11723_PEA_1_T5 (SEQ ID NO: 148) 1905 1928 R11723_PEA_1_T6(SEQ ID NO: 149) 1939 1962

Segment cluster R11723_PEA_(—)1_node_(—)29 (SEQ ID NO:1009) according tothe present invention is supported by 69 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R11723_PEA_(—)1_T15 (SEQ IDNO:144), R11723_PEA_(—)1_T5 (SEQ ID NO:148) and R11723_PEA_(—)1_T6 (SEQID NO:149). Table 1270 below describes the starting and ending positionof this segment on each transcript.

TABLE 1270 Segment location on transcripts Segment starting SegmentTranscript name position ending position R11723_PEA_1_T15 (SEQ ID NO:144) 1011 1038 R11723_PEA_1_T5 (SEQ ID NO: 148) 1929 1956R11723_PEA_1_T6 (SEQ ID NO: 149) 1963 1990

Segment cluster R11723_PEA_(—)1_node_(—)3 (SEQ ID NO:1010) according tothe present invention can be found in the following transcript(s):R11723_PEA_(—)1_T15 (SEQ ID NO:144), R11723_PEA_(—)1_T17 (SEQ IDNO:145), R11723_PEA_(—)1_T19 (SEQ ID NO:146), R11723_PEA_(—)1_T20 (SEQID NO:147), R11723_PEA_(—)1_T5 (SEQ ID NO:148) and R11723_PEA_(—)1_T6(SEQ ID NO:149). Table 1271 below describes the starting and endingposition of this segment on each transcript.

TABLE 1271 Segment location on transcripts Segment starting SegmentTranscript name position ending position R11723_PEA_1_T15 (SEQ ID NO:144) 310 319 R11723_PEA_1_T17 (SEQ ID NO: 145) 310 319 R11723_PEA_1_T19(SEQ ID NO: 146) 310 319 R11723_PEA_1_T20 (SEQ ID NO: 147) 310 319R11723_PEA_1_T5 (SEQ ID NO: 148) 310 319 R11723_PEA_1_T6 (SEQ ID NO:149) 310 319

Segment cluster R11723_PEA_(—)1_node_(—)30 (SEQ ID NO:1011) according tothe present invention can be found in the following transcript(s):R11723_PEA_(—)1_T15 (SEQ ID NO:144), R11723_PEA_(—)1_T5 (SEQ ID NO:148)and R11723_PEA_(—)1_T6 (SEQ ID NO:149). Table 1272 below describes thestarting and ending position of this segment on each transcript.

TABLE 1272 Segment location on transcripts Segment starting SegmentTranscript name position ending position R11723_PEA_1_T15 (SEQ ID NO:144) 1039 1059 R11723_PEA_1_T5 (SEQ ID NO: 148) 1957 1977R11723_PEA_1_T6 (SEQ ID NO: 149) 1991 2011

Segment cluster R11723_PEA_(—)1_node_(—)4 (SEQ ID NO:1012) according tothe present invention is supported by 25 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R11723_PEA_(—)1_T15 (SEQ IDNO:144), R11723_PEA_(—)1_T17 (SEQ ID NO:145), R11723_PEA_(—)1_T19 (SEQID NO:146), R11723_PEA_(—)1_T20 (SEQ ID NO:147), R11723_PEA_(—)1_T₅ (SEQID NO:148) and R11723_PEA_(—)1_T6 (SEQ ID NO:149). Table 1273 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1273 Segment location on transcripts Segment starting SegmentTranscript name position ending position R11723_PEA_1_T15 (SEQ ID NO:144) 320 371 R11723_PEA_1_T17 (SEQ ID NO: 145) 320 371 R11723_PEA_1_T19(SEQ ID NO: 146) 320 371 R11723_PEA_1_T20 (SEQ ID NO: 147) 320 371R11723_PEA_1_T5 (SEQ ID NO: 148) 320 371 R11723_PEA_1_T6 (SEQ ID NO:149) 320 371

Segment cluster R11723_PEA_(—)1_node_(—)5 (SEQ ID NO:1013) according tothe present invention is supported by 26 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R11723_PEA_(—)1_T15 (SEQ IDNO:144), R11723_PEA_(—)1_T17 (SEQ ID NO:145), R11723_PEA_(—)1_T19 (SEQID NO:146), R11723_PEA_(—)1_T20 (SEQ ID NO:147), R11723_PEA_(—)1_T5 (SEQID NO:148) and R11723_PEA_(—)1_T6 (SEQ ID NO:149). Table 1274 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1274 Segment location on transcripts Segment starting SegmentTranscript name position ending position R11723_PEA_1_T15 (SEQ ID NO:144) 372 414 R11723_PEA_1_T17 (SEQ ID NO: 145) 372 414 R11723_PEA_1_T19(SEQ ID NO: 146) 372 414 R11723_PEA_1_T20 (SEQ ID NO: 147) 372 414R11723_PEA_1_T5 (SEQ ID NO: 148) 372 414 R11723_PEA_1_T6 (SEQ ID NO:149) 372 414

Segment cluster R11723_PEA_(—)1_node_(—)6 (SEQ ID NO:1014) according tothe present invention is supported by 27 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R11723_PEA_(—)1_T15 (SEQ IDNO:144), R11723_PEA_(—)1_T17 (SEQ ID NO:145), R11723_PEA_(—)1_T19 (SEQID NO:146), R11723_PEA_(—)1_T20 (SEQ ID NO:147), R11723_PEA_(—)1_T5 (SEQID NO:148) and R11723_PEA_(—)1_T6 (SEQ ID NO:149). Table 1275 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1275 Segment location on transcripts Segment starting SegmentTranscript name position ending position R11723_PEA_1_T19 (SEQ ID NO:146) 415 446 R11723_PEA_1_T20 (SEQ ID NO: 147) 415 446 R11723_PEA_1_T5(SEQ ID NO: 148) 415 446 R11723_PEA_1_T6 (SEQ ID NO: 149) 415 446

Segment cluster R11723_PEA_(—)1_node_(—)7 (SEQ ID NO:1015) according tothe present invention is supported by 29 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R11723_PEA_(—)1_T15 (SEQ IDNO:144), R11723_PEA_(—)1_T17 (SEQ ID NO:145), R11723_PEA_(—)1_T19 (SEQID NO:146), R11723_PEA_(—)1_T20 (SEQ ID NO:147), R11723_PEA_(—)1_T5 (SEQID NO:148) and R11723_PEA_(—)1_T6 (SEQ ID NO:149). Table 1276 belowdescribes the starting and ending position of this segment on eachtranscript.

TABLE 1276 Segment location on transcripts Segment Segment startingending Transcript name position position R11723_PEA_1_T15 (SEQ ID NO:144) 447 485 R11723_PEA_1_T17 (SEQ ID NO: 145) 447 485 R11723_PEA_1_T19(SEQ ID NO: 146) 447 485 R11723_PEA_1_T20 (SEQ ID NO: 147) 447 485R11723_PEA_1_T5 (SEQ ID NO: 148) 447 485 R11723_PEA_1_T6 (SEQ ID NO:149) 447 485

Segment cluster R11723_PEA_(—)1_node_(—)8 (SEQ ID NO:1016) according tothe present invention is supported by 2 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R11723_PEA_(—)1_T6 (SEQ IDNO:149). Table 1277 below describes the starting and ending position ofthis segment on each transcript.

TABLE 1277 Segment location on transcripts Segment Segment Transcriptname starting position ending position R11723_PEA_1_T6 (SEQ ID 486 519NO: 149)Variant protein alignment to the previously known protein:

It should be noted that the nucleotide transcript sequence of knownprotein (PSEC, also referred to herein as the “wild type” or WT protein)feature at least one SNP that appears to affect the coding region, inaddition to certain silent SNPs. This SNP does not have an effect on theR11723_PEA_(—)1_T5 (SEQ ID NO:148) splice variant sequence): “G→”resulting in a missing nucleotide (affects amino acids from position 91onwards). The missing nucleotide creates a frame shift, resulting in anew protein. This SNP was not previously identified and is supported by5 ESTs out of ˜70 ESTs in this exon.

It should be noted that the variants of this cluster are variants of thehypothetical protein PSEC0181 (referred to herein as “PSEC”).Furthermore, use of the known protein (WT protein) for detection of lungcancer, alone or in combination with one or more variants of thiscluster and/or of any other cluster and/or of any known marker, alsocomprises an embodiment of the present invention.

Expression of R11723 Transcripts which are Detectable by Amplicon asDepicted in Sequence Name R11723 seg13 (SEQ ID NO: 1684) in Normal andCancerous Lung Tissues

Expression of transcripts detectable by or according to R11723 seg13,R11723 seg13 amplicon (SEQ ID NO: 1684), and R11723 seg13F (SEQ ID NO:1682), and R11723 seg13R (SEQ ID NO: 1683), primers was measured by realtime PCR. In parallel the expression of four housekeeping genes PBGD(GenBank Accession No. BC019323 (SEQ ID NO:1713);amplicon—PBGD-amplicon, SEQ ID NO:334), HPRT1 (GenBank Accession No.NM_(—)000194 (SEQ ID NO:1714); amplicon—HPRT1-amplicon, SEQ ID NO:1297),and SDHA (GenBank Accession No. NM_(—)004168 (SEQ ID NO:1712);amplicon—SDHA-amplicon, SEQ ID NO:331), Ubiquitin (GenBank Accession No.BC000449 (SEQ ID NO:1711); amplicon—Ubiquitin-amplicon, SEQ ID NO:328)was measured similarly. For each RT sample, the expression of the aboveamplicon was normalized to the geometric mean of the quantities of thehousekeeping genes. The normalized quantity of each RT sample was thendivided by the median of the quantities of the normal post-mortem (PM)samples (Sample Nos. 47-50, 90-93, 96-99, Table 2 “Tissue samples intesting panel”, above), to obtain a value of fold up-regulation for eachsample relative to median of the normal PM samples.

FIG. 48 is a histogram showing over expression of the above-indicatedtranscripts in cancerous lung samples relative to the normal samples.The number and percentage of samples that exhibit at least 5 foldover-expression, out of the total number of samples tested is indicatedin the bottom.

As is evident from FIG. 48, the expression of transcripts detectable bythe above amplicon(s) in cancer samples was higher than in thenon-cancerous samples (Sample Nos. 47-50, 90-93, 96-99 Table 2 “Tissuesamples in testing panel”). Notably an over-expression of at least 5fold was found in 10 out of 15 adenocarcinoma samples, and in 4 out of 8small cells carcinoma samples.

Primer pairs are also optionally and preferably encompassed within thepresent invention; for example, for the above experiment, the followingprimer pair was used as a non-limiting illustrative example only of asuitable primer pair: R11723 seg13F forward primer (SEQ ID NO: 1682);and R11723 seg13R reverse primer (SEQ ID NO: 1683).

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: R11723 seg13 (SEQ IDNO: 1684).

R11723seg13F, (SEQ ID NO: 1682) ACACTAAAAGAACAAACACCTTGCTC R11723seg13R,(SEQ ID NO: 1683) TCCTCAGAAGGCACATGAAAGA R11723seg13-amplicon,: (SEQ IDNO: 1684) ACACTAAAAGAACAAACACCTTGCTCTTCGAGATGAGACATTTTGCCAAGCAGTTGACCACTTAGTTCTCAAGAAGCAACTATCTCTTTCATGTGCCTTC TGAGGA

Expression of R11723 Transcripts which are Detectable by Amplicon asDepicted in Sequence Name R11723seg13 (SEQ ID NO: 1684) in DifferentNormal Tissues

Expression of R11723 transcripts detectable by or according toR11723seg13 amplicon (SEQ ID NO: 1684), and R11723seg13F (SEQ ID NO:1682), R11723seg13R (SEQ ID NO: 1683), was measured by real time PCR. Inparallel the expression of four housekeeping genes RPL19 (GenBankAccession No. NM_(—)000981 (SEQ ID NO:1715); RPL19 amplicon, SEQ IDNO:1630), TATA box (GenBank Accession No. NM_(—)003194 (SEQ ID NO:1716);TATA amplicon, SEQ ID NO:1633), UBC (GenBank Accession No. BC000449 (SEQID NO:1711); amplicon—Ubiquitin-amplicon, SEQ ID NO:328) and SDHA(GenBank Accession No. NM_(—)004168 (SEQ ID NO:1712);amplicon—SDHA-amplicon, SEQ ID NO:331) was measured similarly. For eachRT sample, the expression of the above amplicon was normalized to thegeometric mean of the quantities of the housekeeping genes. Thenormalized quantity of each RT sample was then divided by the median ofthe quantities of the ovary samples (Sample Nos. 18-20, Table 2 “Tissuesamples in normal panel” above), to obtain a value of relativeexpression of each sample relative to median of the ovary samples.

R11723seg13F, (SEQ ID NO: 1682) ACACTAAAAGAACAAACACCTTGCTC R11723seg13R,(SEQ ID NO: 1683) TCCTCAGAAGGCACATGAAAGA R11723seg13-amplicon,: (SEQ IDNO: 1684) ACACTAAAAGAACAAACACCTTGCTCTTCGAGATGAGACATTTTGCCAAGCAGTTGACCACTTAGTTCTCAAGAAGCAACTATCTCTTTCATGTGCCTTC TGAGGAThe results are presented in FIG. 49, showing the expression of R11723transcripts which are detectable by amplicon as depicted in sequencename R11723seg13 (SEQ ID NO: 1684) in different normal tissues.

Expression of R11723 Transcripts, which are Detectable by Amplicon asDepicted in Sequence Name R11723 junc11-18 (SEQ ID NO: 1687) in Normaland Cancerous Lung Tissues

Expression of transcripts detectable by or according to junc11-18,R11723 junc11-18 amplicon (SEQ ID NO: 1687) and R11723 junc11-18F (SEQID NO: 1685) and R11723 junc11-18R (SEQ ID NO: 1686) primers wasmeasured by real time PCR (this junction is found in the known proteinsequence or “wild type” (WT) sequence, also termed herein the PSECsequence). In parallel the expression of four housekeeping genes PBGD(GenBank Accession No. BC019323 (SEQ ID NO:1713);amplicon—PBGD-amplicon, SEQ ID NO:334), HPRT1 (GenBank Accession No.NM_(—)000194 (SEQ ID NO:1714); amplicon—HPRT1-amplicon, SEQ ID NO:1297),SDHA (GenBank Accession No. NM_(—)004168 (SEQ ID NO:1712);amplicon—SDHA-amplicon, SEQ ID NO:331), and Ubiquitin (GenBank AccessionNo. BC000449 (SEQ ID NO:1711); amplicon—Ubiquitin-amplicon, SEQ IDNO:328) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of thequantities of the housekeeping genes. The normalized quantity of each RTsample was then divided by the median of the quantities of the normalpost-mortem (PM) samples (Sample Nos. 47-50, 90-93, 96-99, Table 2,above: “Tissue samples in lung cancer testing panel”), to obtain a valueof fold up-regulation for each sample relative to median of the normalPM samples.

FIG. 50 is a histogram showing over expression of the above-indicatedtranscripts in cancerous lung samples relative to the normal samples.Values represent the average of duplicate experiments. Error barsindicate the minimal and maximal values obtained.

As is evident from FIG. 50, the expression of transcripts detectable bythe above amplicon in cancer samples was higher than in thenon-cancerous samples (Sample Nos. 47-50, 90-93, 96-99 Table 2 “Tissuesamples in lung cancer testing panel”). Notably an over-expression of atleast 5 fold was found in 11 out of 15 adenocarcinoma samples, 4 out of16 squamous cell carcinoma samples, 1 out of 4 large cell carcinomasamples and in 5 out of 8 small cells carcinoma samples.

Primer pairs are also optionally and preferably encompassed within thepresent invention; for example, for the above experiment, the followingprimer pair was used as a non-limiting illustrative example only of asuitable primer pair: R11723 junc11-18F forward primer (SEQ ID NO:1685); and R11723 junc11-18R reverse primer (SEQ ID NO: 1686).

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: R11723 junc11-18 (SEQID NO: 1687).

R11723junc11-18F (SEQ ID NO: 1685) AGTGATGGAGCAAAGTGCCG R11723junc11-18R (SEQ ID NO: 1686) CAGCAGCTGATGCAAACTGAG R11723junc11-18-amplicon (SEQ ID NO: 1687)AGTGATGGAGCAAAGTGCCGGGATCATGTACCGCAAGTCCTGTGCATCATCAGCGGCCTGTCTCATCGCCTCTGCCGGGTACCAGTCCTTCTGCTCCCCAGGGAAACTGAACTCAGTTTGCATCAGCTGCTG

Expression of R11723 Transcripts, which were Detected by Amplicon asDepicted in the Sequence Name R11723 junc11-18 (SEQ ID NO: 1687) inDifferent Normal Tissues

Expression of R11723 transcripts detectable by or according toR11723seg13 amplicon (SEQ ID NO: 1687) and R11723 junc11-18F (SEQ ID NO:1685), R11723 junc11-18R(SEQ ID NO: 1686) was measured by real time PCR.In parallel the expression of four housekeeping genes RPL19 (GenBankAccession No. NM_(—)000981 (SEQ ID NO:1715); RPL19 amplicon, SEQ IDNO:1630), TATA box (GenBank Accession No. NM_(—)003194 (SEQ ID NO:1716);TATA amplicon, SEQ ID NO:1633), UBC (GenBank Accession No. BC000449 (SEQID NO:1711); amplicon—Ubiquitin-amplicon, SEQ ID NO:328) and SDHA(GenBank Accession No. NM_(—)004168 (SEQ ID NO:1712);amplicon—SDHA-amplicon, SEQ ID NO:331) was measured similarly. For eachRT sample, the expression of the above amplicon was normalized to thegeometric mean of the quantities of the housekeeping genes. Thenormalized quantity of each RT sample was then divided by the median ofthe quantities of the ovary samples (Sample Nos. 18-20 Table 3 above),to obtain a value of relative expression of each sample relative tomedian of the ovary samples.

R11723junc11-18F (SEQ ID NO: 1685) AGTGATGGAGCAAAGTGCCG R11723junc11-18R (SEQ ID NO: 1686) CAGCAGCTGATGCAAACTGAG R11723junc11-18-amplicon (SEQ ID NO: 1687)AGTGATGGAGCAAAGTGCCGGGATCATGTACCGCAAGTCCTGTGCATCATCAGCGGCCTGTCTCATCGCCTCTGCCGGGTACCAGTCCTTCTGCTCCCCAGGGAAACTGAACTCAGTTTGCATCAGCTGCTGThe results are demonstrated in FIG. 73, showing the expression ofR11723 transcripts, which were detected by amplicon as depicted in thesequence name R11723 junc11-18 (SEQ ID NO: 1687) in different normaltissues.Cloning of this Variant

Full Length Validation

RNA Preparation

Human adult papillary adenocarcinoma ovary RNA pool (lot #ILS1408) wasobtained from ABS (http://www.absbioreagents, Wilmington, Del. 19801,USA com). Total RNA samples were treated with DNaseI (Ambion Cat #1906).

RT PCR

RT Preparation

Purified RNA (1 ug) was mixed with 150 ng Random Hexamer primers(Invitrogen Cat #48190-011) and 500 uM dNTP (Takara, Cat #B9501-1) in atotal volume of 15.6 ul DEPC-H₂O (Beit Haemek, Cat #01-852-1A). Themixture was incubated for 5 min at 65° C. and then quickly chilled onice. Thereafter, 5 ul of 5× Superscript II first strand buffer(Invitrogen, Cat #Y00146), 2.4 ul 0.1M DTT (Invitrogen, Cat #Y00147) and40 units RNasin (Promega, Cat #N251A) were added, and the mixture wasincubated for 2 min at 42° C. Then, 1 ul (200 units) of SuperscriptII(Invitrogen, Cat #18064-022) was added and the reaction was incubatedfor 50 min at 42° C. and then inactivated at 70° C. for 15 min. Theresulting cDNA was diluted 1:20 in TE buffer (10 mM Tris pH=8, 1 mM EDTApH=8).

PCR Amplification and Analysis

cDNA (5 ul), prepared as described above, was used as a template in PCRreactions. The amplification was done using AccuPower PCR PreMix(Bioneer, Korea, Cat #K2016), under the following conditions: 1 ul—ofeach primer (10 uM)

PSECfor-TGCTGTCGCCTCCTCTGATG (SEQ ID NO: 1777)PSECrev-CCTCAGAAGGCACATGAAAG (SEQ ID NO: 1778)plus 13 ul—H₂O were added into AccuPower PCR PreMix tube with a reactionprogram of 5 minutes at 94° C.; 35 cycles of: [30 seconds at 94° C., 30seconds at 52° C., 40 seconds at 72° C.] and 10 minutes at 72° C. At theend of the PCR amplification, products were analyzed on agarose gelsstained with ethidium bromide and visualized with UV light. PCR productwas extracted from the gel using QiaQuick™ gel extraction kit (Qiagen™,Cat #28706). The extracted DNA product (FIG. 79) was sequenced by directsequencing using the gene specific primers from above (Hy-Labs, Israel),resulting in the expected sequence of PSEC variant R11723_PEA_(—)1 T5(SEQ ID NO:148) (FIG. 80).

It was concluded that the predicted PSEC variant R11723_PEA_(—)1 T5 (SEQID NO:148) is indeed a naturally expressed variant in an adult papillaryadenocarcinoma ovary human tissue as shown in FIG. 79.

Cloning of PSEC variant R11723_PEA_(—)1 T5 (SEQ ID NO:148) intobacterial expression vector

The PSEC splice variant R11723_PEA_(—)1 T5 (SEQ ID NO:148) codingsequence was prepared for cloning by PCR amplification using thefragment described above as template and Platinum Pfx DNA polymerase(Invitrogen Cat #11708021) under the following conditions: 5ul—Amplification ×10 buffer (Invitrogen Cat #11708021); 2 ul—PCR productfrom above; 1 ul—dNTPs (10 mM each); 1 μl MgSO4 (50 mM) 5 ul enhancersolution (Invitrogen Cat #11708021); 33 ul—H₂O; 1 ul—of each primer (10uM) and 1.25 units of Taq polymerase [Platinum Pfx DNA polymerase(Invitrogen Cat #11708021)] in a total reaction volume of 50 ul with areaction program of 3 minutes at 94° C.; 29 cycles of: [30 seconds at94° C., 30 seconds at 58° C., 40 seconds at 68° C.] and 7 minutes at 68°C. The Primers listed below include specific sequences of the nucleotidesequence corresponding to the splice variant and NheI and HindIIIrestriction sites.

(SEQ ID NO: 1779) PSEC NheIfor-ATAGCTAGCATGTGGGTCCTAGGCATCGCGG (SEQ IDNO: 1780) PSEC HindIIIrev-CCCAAGCTTCTAAGTGGTCAACTGCTTGGC

The PCR product was then double digested with NheI and HindIII (NewEngland Biolabs (UK) LTD) (FIG. 81), and inserted into pRSET-A(Invitrogen, Cat #V351-20), previously digested with the same enzymes,in-frame to an N-terminal 6His-tag, to give HisPSEC T5 pRSET (FIG. 82).The coding sequence encodes for a protein having the 6His-tag at the N′end (6His residues in a row at one end of the protein), and 8 additionalamino acids encoded by the pRSET vector.

The sequence of the PSEC insert in the final plasmid, as well as itsflanking regions, were verified by sequencing and found to be identicalto the desired sequences. The complete sequence of His PSEC T5 pRESTA,including the sequenced regions, is shown in FIG. 84.

FIG. 83 shows the translated sequence of PSEC variant R11723_PEA_(—)1 T5(SEQ ID NO:148).

Bacterial Culture and Induction of Protein Expression

HisPSEC pRSETA DNA was transformed into competent DH5a cells (InvitrogenCat #18258-012). Ampicillin resistant transformants were screened andpositive clones were further analyzed by restriction enzyme digestionand sequence verification.

In order to express the recombinant protein, HisPSEC pRSETA DNA wasfurther transformed into competent BL21Gold cells (Stratagene Cat#230134) and BL21star (Invitrogen Cat #44-0054). Ampicillin resistanttransformants were screened and positive clones were selected.

Bacterial cells containing the HisPSEC T5 pRSET vector or empty pRSETvector (as negative control) were grown in LB medium, supplemented withAmpicillin (50 ug/ml) and chloramphenicol (34 ug/ml), until O.D. 600 nmreached 0.55. This value was reached in about 3 hours. 1 mM IPTG (Roche,Cat #724815) was added and the cells were grown at 37° C. overnight. 1ml aliquots of each culture were removed for gel analysis at time zero,3 hrs after induction and following overnight incubation (T0, T3 andTO/N, respectively).

Expression Results

The time course of small-scale expression of PSEC in BL21Gold isdemonstrated in FIG. 85. The expression of a recombinant protein withthe appropriate molecular weight (9.2 kDa) was visualized by WesternBlot with anti-His antibodies (BD Clontech, Ref 631212, FIG. 85), butnot by Coomassie staining (data not shown). Similar expression patternwas obtained with BL21 star as well (data not shown).

These results show that the protein encoded by PSEC variantR11723_PEA_(—)1 T5 (SEQ ID NO:148) is indeed expressed in bacterialcells.

Description for Cluster R16276

Cluster R16276 features 1 transcript(s) and 5 segment(s) of interest,the names for which are given in Tables 1278 and 1279, respectively, thesequences themselves are given at the end of the application. Theselected protein variants are given in table 1280.

TABLE 1278 Transcripts of interest Transcript Name Sequence ID No.R16276_PEA_1_T6 150

TABLE 1279 Segments of interest Segment Name Sequence ID No.R16276_PEA_1_node_0 1017 R16276_PEA_1_node_6 1018 R16276_PEA_1_node_11019 R16276_PEA_1_node_4 1020 R16276_PEA_1_node_5 1021

TABLE 1280 Proteins of interest Protein Name Sequence ID No.Corresponding Transcript(s) R16276_PEA_1_P7 1414 R16276_PEA_1_T6 (SEQ IDNO: 150)

These sequences are variants of the known protein NOV protein homologprecursor (SwissProt accession identifier NOV_HUMAN; known alsoaccording to the synonyms NovH; Nephroblastoma overexpressed geneprotein homolog), SEQ ID NO:1463, referred to herein as the previouslyknown protein.

Protein NOV protein homolog precursor (SEQ ID NO:1463) is known orbelieved to have the following function(s): Immediate-early protein,likely to play a role in cell growth regulation (By similarity). Thesequence for protein NOV protein homolog precursor is given at the endof the application, as “NOV protein homolog precursor amino acidsequence”. Known polymorphisms for this sequence are as shown in Table1281.

TABLE 1281 Amino acid mutations for Known Protein SNP position(s) onamino acid sequence Comment 97 N -> K

Protein NOV protein homolog precursor (SEQ ID NO:1463) localization isbelieved to be Secreted.

The following GO Annotation(s) apply to the previously known protein.The following annotation(s) were found: regulation of cell growth, whichare annotation(s) related to Biological Process; insulin-like growthfactor binding; growth factor, which are annotation(s) related toMolecular Function; and extracellular, which are annotation(s) relatedto Cellular Component.

The GO assignment relies on information from one or more of theSwissProt/TremB1 Protein knowledgebase, available from <dot expasy dotch/sprot/>; or Locuslink, available from <dot ncbi dot nlm dot nih dotgov/projects/LocusLink/>.

Cluster R16276 can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 51 refer to weightedexpression of ESTs in each category, as “parts per million” (ratio ofthe expression of ESTs for a particular cluster to the expression of allESTs in that category, according to parts per million).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 51 and Table 1282. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions: lungmalignant tumors.

TABLE 1282 Normal tissue distribution Name of Tissue Number Adrenal 977Bone 32 Brain 24 Colon 0 Epithelial 63 General 43 Kidney 24 Liver 341Lung 0 Breast 0 Muscle 20 Ovary 0 Pancreas 0 Prostate 24 Skin 13 Stomach146 Uterus 0

TABLE 1283 P values and ratios for expression in cancerous tissue Nameof Tissue P1 P2 SP1 R3 SP2 R4 Adrenal 5.9e−01 6.2e−01 1 0.2 9.9e−01 0.2Bone 5.5e−01 7.3e−01 1 0.8 1 0.6 Brain 2.8e−01 4.4e−01 6.8e−01 0.98.9e−01 0.6 Colon 2.6e−01 3.3e−01 4.9e−01 2.0 5.9e−01 1.7 Epithelial2.6e−01 2.9e−01 9.7e−01 0.6 1 0.5 General 4.1e−01 6.8e−01 9.4e−01 0.7 10.5 Kidney 8.3e−01 7.7e−01 6.2e−01 1.2 5.3e−01 1.4 Liver 9.1e−01 7.5e−011 0.1 1 0.1 Lung 2.3e−02 9.1e−02 8.0e−04 10.5 2.1e−02 5.1 Breast 5.9e−016.7e−01 6.9e−01 1.5 8.2e−01 1.2 Muscle 5.2e−01 6.1e−01 2.7e−01 3.26.3e−01 1.2 Ovary 6.2e−01 6.5e−01 6.8e−01 1.5 7.7e−01 1.3 Pancreas3.3e−01 4.4e−01 4.2e−01 2.4 5.3e−01 1.9 Prostate 9.3e−01 9.4e−01 1 0.59.4e−01 0.6 Skin 9.2e−01 6.8e−01 1 0.5 4.1e−01 1.1 Stomach 5.0e−017.3e−01 5.0e−01 0.6 9.7e−01 0.4 Uterus 2.4e−01 1.6e−01 2.9e−01 2.54.1e−01 2.0

As noted above, cluster R16276 features 1 transcript(s), which werelisted in Table 1278 above. These transcript(s) encode for protein(s)which are variant(s) of protein NOV protein homolog precursor (SEQ IDNO:1463). A description of each variant protein according to the presentinvention is now provided.

Variant protein R16276_PEA_(—)1_P7 (SEQ ID NO:1414) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) R16276_PEA_(—)1_T6 (SEQ IDNO:150). An alignment is given to the known protein (NOV protein homologprecursor (SEQ ID NO:1463)) at the end of the application. One or morealignments to one or more previously published protein sequences aregiven at the end of the application. A brief description of therelationship of the variant protein according to the present inventionto each such aligned protein is as follows:

Comparison report between R16276_PEA_(—)1_P7 (SEQ ID NO:1414) andNOV_HUMAN (SEQ ID NO:1463):

1. An isolated chimeric polypeptide encoding for R16276_PEA_(—)1_P7 (SEQID NO:1414), comprising a first amino acid sequence being at least 90%homologous to MQSVQSTSFCLRKQCLCLTFLLLHLLGQVAATQRCPPQCPG corresponding toamino acids 1-41 of NOV_HUMAN (SEQ ID NO:1463), which also correspondsto amino acids 1-41 of R16276_PEA_(—)1_P7 (SEQ ID NO:1414), a bridgingamino acid Q corresponding to amino acid 42 of R16276_PEA_(—)1_P7 (SEQID NO:1414), a second amino acid sequence being at least 90% homologousto CPATPPTCAPGVRAVLDGCSCCLVCARQRGESCSDLEPCDESSGLYCDRSADPSNQTGICTcorresponding to amino acids 43-103 of NOV_HUMAN (SEQ ID NO:1463), whichalso corresponds to amino acids 43-103 of R16276_PEA_(—)1_P7 (SEQ IDNO:1414), and a third amino acid sequence being at least 70%, optionallyat least 80%, preferably at least 85%, more preferably at least 90% andmost preferably at least 95% homologous to a polypeptide having thesequence GNPAPSAV (SEQ ID NO: 1748) corresponding to amino acids 104-111of R16276_PEA_(—)1_P7 (SEQ ID NO:1414), wherein said first amino acidsequence, bridging amino acid, second amino acid sequence and thirdamino acid sequence are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of R16276_PEA_(—)1_P7(SEQ ID NO:1414), comprising a polypeptide being at least 70%,optionally at least about 80%, preferably at least about 85%, morepreferably at least about 90% and most preferably at least about 95%homologous to the sequence GNPAPSAV (SEQ ID NO: 1748) inR16276_PEA_(—)1_P7 (SEQ ID NO:1414).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein R16276_PEA_(—)1_P7 (SEQ ID NO:1414) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 1284, (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein R16276_PEA_(—)1_P7 (SEQ ID NO:1414) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 1284 Amino acid mutations SNP position(s) on amino acidAlternative sequence amino acid(s) Previously known SNP? 42 Q -> R Yes

The glycosylation sites of variant protein R16276_PEA_(—)1_P7 (SEQ IDNO:1414), as compared to the known protein NOV protein homolog precursor(SEQ ID NO:1463), are described in Table 1285 (given according to theirposition(s) on the amino acid sequence in the first column; the secondcolumn indicates whether the glycosylation site is present in thevariant protein; and the last column indicates whether the position isdifferent on the variant protein).

TABLE 1285 Glycosylation site(s) Position(s) on known amino Present inacid sequence variant protein? Position in variant protein? 280 no 97yes 97

Variant protein R16276_PEA_(—)1_P7 (SEQ ID NO:1414) is encoded by thefollowing transcript(s): R16276_PEA_(—)1_T6 (SEQ ID NO:150), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript R16276_PEA_(—)1_T6 (SEQ ID NO:150) is shown inbold; this coding portion starts at position 445 and ends at position777. The transcript also has the following SNPs as listed in Table 1286(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinR16276_PEA_(—)1_P7 (SEQ ID NO:1414) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 1286 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 371 G -> No 430 A -> G No569 A -> G Yes 729 C -> A Yes 827 G -> T Yes

As noted above, cluster R16276 features 5 segment(s), which were listedin Table 2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster R16276_PEA_(—)1_node_(—)0 (SEQ ID NO:1017) according tothe present invention is supported by 35 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R16276_PEA_(—)1_T6 (SEQ IDNO:150). Table 1287 below describes the starting and ending position ofthis segment on each transcript.

TABLE 1287 Segment location on transcripts Segment Segment Transcriptname starting position ending position R16276_PEA_1_T6 (SEQ ID NO: 1 438150)

Segment cluster R16276_PEA_(—)1_node_(—)6 (SEQ ID NO:1018) according tothe present invention is supported by 2 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R16276_PEA_(—)1_T6 (SEQ IDNO:150). Table 1288 below describes the starting and ending position ofthis segment on each transcript.

TABLE 1288 Segment location on transcripts Segment Segment Transcriptname starting position ending position R16276_PEA_1_T6 (SEQ ID NO: 755876 150)

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster R16276_PEA_(—)1_node_(—)1 (SEQ ID NO:1019) according tothe present invention is supported by 37 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R16276_PEA_(—)1_T6 (SEQ IDNO:150). Table 1289 below describes the starting and ending position ofthis segment on each transcript.

TABLE 1289 Segment location on transcripts Segment Segment Transcriptname starting position ending position R16276_PEA_1_T6 (SEQ ID NO: 439528 150)

Segment cluster R16276_PEA_(—)1_node_(—)4 (SEQ ID NO:1020) according tothe present invention is supported by 38 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R16276_PEA_(—)1_T6 (SEQ IDNO:150). Table 1290 below describes the starting and ending position ofthis segment on each transcript.

TABLE 1290 Segment location on transcripts Segment Segment Transcriptname starting position ending position R16276_PEA_1_T6 (SEQ ID NO: 529639 150)

Segment cluster R16276_PEA_(—)1_node_(—)5 (SEQ ID NO:1021) according tothe present invention is supported by 37 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): R16276_PEA_(—)1_T6 (SEQ IDNO:150). Table 1291 below describes the starting and ending position ofthis segment on each transcript.

TABLE 1291 Segment location on transcripts Segment Segment Transcriptname starting position ending position R16276_PEA_1_T6 (SEQ ID NO: 640754 150)Variant protein alignment to the previously known protein:

Combined expression of 6 sequences H61775seg8 (SEQ ID NO: 1636),HUMGRP5E junc3-7 (SEQ ID NO: 1648), M85491Seg24 (SEQ ID NO: 1639),Z21368 junc17-21 (SEQ ID NO: 1642), HSSTROL3seg24 (SEQ ID NO: 1675) andZ25299seg20 (SEQ ID NO: 1669) in normal and cancerous lung tissues.

Expression of immunoglobulin superfamily, member 9, gastrin-releasingpeptide, Ephrin type-B receptor 2 precursor, SUL1_HUMAN, Stromelysin-3precursor (EC 3.4.24.-) (Matrix metalloproteinase-11) (MMP-11) (ST3)(SL-3) and Secretory leukocyte protease inhibitor Acid-stable proteinaseinhibitor transcripts detectable by or according to H61775seg8 (SEQ IDNO: 1636), HUMGRP5E junc3-7 (SEQ ID NO: 1648), M85491Seg24 (SEQ ID NO:1639), Z21368 junc17-21 (SEQ ID NO: 1642), HSSTROL3seg24 (SEQ ID NO:1675) and Z25299seg20 amplicons (SEQ ID NO: 1669) and H61775seg8F2 (SEQID NO: 1634), H61775seg8R2 (SEQ ID NO: 1635), HUMGRP5E junc3-7F (SEQ IDNO: 1646), HUMGRP5E junc3-7R (SEQ ID NO: 1647), M85491Seg24F (SEQ ID NO:1637), M85491Seg24R (SEQ ID NO: 1638), Z21368 junc17-21F (SEQ ID NO:1640), Z21368 junc17-21R (SEQ ID NO: 1641), HSSTROL3seg24F (SEQ ID NO:1673), HSSTROL3seg24R (SEQ ID NO: 1674), Z25299seg20F (SEQ ID NO: 1667),Z25299seg20R (SEQ ID NO: 1668) primers was measured by real time PCR. Inparallel the expression of four housekeeping genes—PBGD (GenBankAccession No. BC019323 (SEQ ID NO:1713); amplicon—PBGD-amplicon, SEQ IDNO:334), HPRT1 (GenBank Accession No. NM_(—)000194 (SEQ ID NO:1714);amplicon—HPRT1-amplicon, SEQ ID NO:1297), Ubiquitin (GenBank AccessionNo. BC000449 (SEQ ID NO:1711); amplicon—Ubiquitin-amplicon, SEQ IDNO:328) and SDHA (GenBank Accession No. NM_(—)004168 (SEQ ID NO:1712);amplicon—SDHA-amplicon, SEQ ID NO:331) was measured similarly. For eachRT sample, the expression of the above amplicons was normalized to thegeometric mean of the quantities of the housekeeping genes. Thenormalized quantity of each RT sample of each amplicon was then dividedby the median of the quantities of the normal post-mortem (PM) samplesdetected for the same amplicon (Sample Nos. 47-50, 90-93, 96-99, Table2, “Tissue samplesin testing panel”, above), to obtain a value of foldup-regulation for each sample relative to median of the normal PMsamples. The reciprocal of this ratio was calculated for Z25299seg20(SEQ ID NO: 1669), to obtain a value of fold down-regulation for eachsample relative to median of the normal PM samples.

FIGS. 52-53 are histograms showing differential expression of theabove-indicated transcripts in cancerous lung samples relative to thenormal samples. The number and percentage of samples that exhibit atleast 5 fold differential of at least one of the sequences, out of thetotal number of samples tested is indicated in the bottom.

As is evident from FIGS. 52-53, differential expression of at least 5fold in at least one of the sequences was found in 15 out of 15adenocarcinoma samples, 14 out of 16 squamous cell carcinoma samples, 4out of 4 large cell carcinoma samples and in 8 out of 8 small cellcarcinoma samples.

Statistical analysis was applied to verify the significance of theseresults, as described below. Threshold of 5 fold differential expressionof at least one of the amplicons was found to differentiate betweencancer and normal samples with P value of 7.82E-06 in adenocarcinoma,2.63E-04 in squamous cell carcinoma, 8.24E-03 in large celladenocarcinoma and 3.57E-04 in small cell carcinoma as checked by exactfisher test.

The above values demonstrate statistical significance of the results.

Description for Cluster H53626

Cluster H53626 features 2 transcript(s) and 20 segment(s) of interest,the names for which are given in Tables 1292 and 1293, respectively, thesequences themselves are given at the end of the application.

TABLE 1292 Transcripts of interest Transcript Name SEQ ID NO:H53626_PEA_1_T15 16 H53626_PEA_1_T16 17

TABLE 1293 Segments of interest Segment Name SEQ ID NO:H53626_PEA_1_node_15 18 H53626_PEA_1_node_22 19 H53626_PEA_1_node_25 306H53626_PEA_1_node_26 307 H53626_PEA_1_node_27 308 H53626_PEA_1_node_34309 H53626_PEA_1_node_35 310 H53626_PEA_1_node_36 311H53626_PEA_1_node_11 312 H53626_PEA_1_node_12 313 H53626_PEA_1_node_16314 H53626_PEA_1_node_19 315 H53626_PEA_1_node_20 316H53626_PEA_1_node_24 317 H53626_PEA_1_node_28 318 H53626_PEA_1_node_29319 H53626_PEA_1_node_30 320 H53626_PEA_1_node_31 321H53626_PEA_1_node_32 322 H53626_PEA_1_node_33 323

TABLE 1294 Proteins of interest Transcript Name SEQ ID NO:H53626_PEA_1_P4 324 H53626_PEA_1_P5 325

Cluster H53626 can be used as a diagnostic marker according tooverexpression of transcripts of this cluster in cancer. Expression ofsuch transcripts in normal tissues is also given according to thepreviously described methods. The term “number” in the right hand columnof the table and the numbers on the y-axis of FIG. 76 below refer toweighted expression of ESTs in each category, as “parts per million”(ratio of the expression of ESTs for a particular cluster to theexpression of all ESTs in that category, according to parts permillion).

Overall, the following results were obtained as shown with regard to thehistograms in FIG. 76 and Table 1295. This cluster is overexpressed (atleast at a minimum level) in the following pathological conditions:epithelial malignant tumors, a mixture of malignant tumors fromdifferent tissues and myosarcoma.

TABLE 1295 Normal tissue distribution Name of Tissue Number adrenal 4bone 233 brain 33 colon 0 epithelial 12 general 17 head and neck 0kidney 8 lung 25 breast 8 muscle 0 ovary 7 pancreas 10 prostate 8 skin 0stomach 73 Thyroid 0 uterus 0

TABLE 1296 P values and ratios for expression in cancerous tissue Nameof Tissue P1 P2 SP1 R3 SP2 R4 adrenal 6.4e−01 4.2e−01 2.1e−01 3.11.3e−02 4.1 bone 5.8e−01 8.1e−01 9.8e−01 0.3 1.0e+00 0.3 brain 2.2e−012.6e−01 8.1e−01 0.8 8.9e−01 0.6 colon 2.3e−01 1.4e−01 1.5e+00 1.24.6e−01 1.9 epithelial 8.3e−02 4.8e−03 6.4e−02 1.5 6.6e−08 4.1 general2.4e−03 1.5e−05 1.1e−03 1.6 2.0e−12 3.1 head and neck 2.1e−01 3.3e−010.0e+00 0.0 0.0e+00 0.0 kidney 7.3e−01 5.8e−01 5.8e−01 1.3 5.7e−02 2.0lung 8.3e−01 5.5e−01 7.9e−01 0.8 3.2e−02 2.1 breast 6.5e−01 2.7e−016.9e−01 1.2 7.8e−02 1.9 muscle 1.5e+00 2.9e−01 1.5e+00 1.0 3.5e−03 4.1ovary 6.7e−01 5.6e−01 1.5e−01 1.7 7.0e−02 2.7 pancreas 2.3e−01 2.0e−013.9e−01 1.9 8.2e−02 2.3 prostate 9.0e−01 9.0e−01 6.7e−01 1.1 1.8e−01 1.9skin 1.5e+00 4.4e−01 1.5e+00 1.0 6.4e−01 1.6 stomach 9.0e−01 3.4e−011.0e+00 0.3 6.1e−01 0.9 Thyroid 2.4e−01 2.4e−01 1.5e+00 1.1 1.5e+00 1.1uterus 2.1e−01 2.4e−01 2.9e−01 2.5 2.6e−01 2.2

As noted above, contig H53626 features 2 transcript(s), which werelisted in Table 1292 above. A description of each variant proteinaccording to the present invention is now provided.

Variant protein H53626_PEA_(—)1_P4 (SEQ ID NO:324) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) H53626_PEA_(—)1_T15 (SEQ IDNO:16). The alignment to the wild type protein is given at the end ofthe application. A brief description of the relationship of the variantprotein according to the present invention to the wild type protein isas follows:

Comparison report between H53626_PEA_(—)1_P4 (SEQ ID NO:324) and wildtype Q8N441 (SEQ ID NO:1699):

1. An isolated chimeric polypeptide encoding for H53626_PEA_(—)1_P4 (SEQID NO:324), comprising a first amino acid sequence being at least 90%homologous toMTPSPLLLLLLPPLLLGAFPPAAAARGPPKMADKVVPRQVARLGRTVRLQCPVEGDPPPLTMWTKDGRTIHSGWSRFRVLPQGLKVKQVEREDAGVYVCKATNGFGSLSVNYTLVVLDDISPGKESLGPDSSSGGQEDPASQQWARPRFTQPSKMRRRVIARPVGSSVRLKCVASGHPRPDITWMKDDQALTRPEAAEPRKKKWTLSLKNLRPEDSGKYTCRVSNRAGAINATYKVDVIQRTRSKPVLTGTHPVNTTVDFGGTTSFQCKVRSDVKPVIQWLKRVEYGAEGRHNSTIDVGGQKFVVLPTGDVWSRPDGSYLNKLLITRARQDDAGMYICLGANTMGYSFRSAFLTVLP corresponding to amino acids 1-357 of Q8N441 (SEQ ID NO:1699),which also corresponds to amino acids 1-357 of H53626_PEA_(—)1_P4 (SEQID NO:324), second amino acid sequence being at least 70%, optionally atleast 80%, preferably at least 85%, more preferably at least 90% andmost preferably at least 95% homologous to a polypeptide having thesequenceGARLPRHATPCWCPDPPPGPGVPPTGWGPTLPSRAVLARSSAEGGQPRGTVSTAPGMGLGCSPGLCVGVPLPTSFPLALA (SEQ ID NO: 1775) corresponding to amino acids 358-437 ofH53626_PEA_(—)1_P4 (SEQ ID NO:324), and a third amino acid sequencebeing at least 90% homologous toDPKPPGPPVASSSSATSLPWPVVIGIPAGAVFILGTLLLWLCQAQKKPCTPAPAPPLPGHRPPGTARDRSGDKDLPSLAALSAGPGVGLCEEHGSPAAPQHLLGPGPVAGPKLYPKLYTDIHTHTHTHSHTHSHVEGKVHQHIHYQC corresponding to amino acids 358-504 of Q8N441 (SEQ ID NO:1699),which also corresponds to amino acids 438-584 of H53626_PEA_(—)1_P4 (SEQID NO:324), wherein said first, second and third amino acid sequencesare contiguous and in a sequential order.

2. An isolated polypeptide encoding for an edge portion ofH53626_PEA_(—)1_P4 (SEQ ID NO:324), comprising an amino acid sequencebeing at least 70%, optionally at least about 80%, preferably at leastabout 85%, more preferably at least about 90% and most preferably atleast about 95% homologous to the sequence encoding forGARLPRHATPCWCPDPPPGPGVPPTGWGPTLPSRAVLARSSAEGGQPRGTVSTAPGMGLGCSPGLCVGLPTSFPLALA (SEQ ID NO: 1775), corresponding to H53626_PEA_(—)1_P4 (SEQID NO:324).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:membrane. The protein localization is believed to be membrane becausealthough both signal-peptide prediction programs agree that this proteinhas a signal peptide, both trans-membrane region prediction programspredict that this protein has a trans-membrane region downstream of thissignal peptide.

Variant protein H53626PEA_(—)1_P4 (SEQ ID NO:324) also has the followingnon-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table1297, (given according to their position(s) on the amino acid sequence,with the alternative amino acid(s) listed; the last column indicateswhether the SNP is known or not; the presence of known SNPs in variantprotein H53626_PEA_(—)1_P4 (SEQ ID NO:324) sequence provides support forthe deduced sequence of this variant protein according to the presentinvention).

TABLE 1297 Amino acid mutations SNP position(s) on amino acidAlternative sequence amino acid(s) Previously known SNP? 193 R -> L Yes300 G -> No 319 Y -> H No 442 P -> Q Yes 504 R -> L Yes 521 G -> No 544P -> L Yes 573 E -> G No

Variant protein H53626_PEA_(—)1_P4 (SEQ ID NO:324) is encoded by thefollowing transcript(s): H53626_PEA_(—)1_T15 (SEQ ID NO:16), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript H53626_PEA_(—)1_T15 (SEQ ID NO:16) is shown inbold; this coding portion starts at position 17 and ends at position1771. The transcript also has the following SNPs as listed in Table 1298(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinH53626_PEA_(—)1_P4 (SEQ ID NO:324) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 1298 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 76 G -> A Yes 340 G -> T No1647 C -> T Yes 1734 A -> G No 1797 G -> No 1948 A -> G Yes 2193 C -> TYes 2308 C -> T Yes 2333 C -> G Yes 2648 C -> T Yes 2649 G -> A Yes 2765C -> T Yes 594 G -> T Yes 2972 G -> A Yes 3027 C -> G Yes 907 T -> C Yes916 C -> No 971 T -> C No 1135 G -> A Yes 1341 C -> A Yes 1527 G -> TYes 1579 C -> No

Variant protein H53626_PEA_(—)1_P5 (SEQ ID NO:325) according to thepresent invention has an amino acid sequence as given at the end of theapplication; it is encoded by transcript(s) H53626_PEA_(—)1_T16 (SEQ IDNO:17). The alignment to the wild type protein is given at the end ofthe application. A brief description of the relationship of the variantprotein according to the present invention to the wild type protein isas follows:

Comparison report between H53626_PEA_(—)1_P5 (SEQ ID NO:325) and wildtype Q9H4D7 (SEQ ID NO:1700):

1. An isolated chimeric polypeptide encoding for H53626_PEA_(—)1_P5 (SEQID NO:325), comprising a first amino acid sequence being at least 90%homologous toMTPSPLLLLLLPPLLLGAFPPAAAARGPPKMADKVVPRQVARLGRTVRLQCPVEGDPPPLTMWTKDGRTIHSGWSRFRVLPQGLKVKQVEREDAGVYVCKATNGFGSLSVNYTLVVLDDISPGKESLGPDSSSGGQEDPASQQWARPRFTQPSKMRRRVIARPVGSSVRLKCVASGHPRPDITWMKDDQALTRPEAAEPRKKKWTLSLKNLRPEDSGKYTCRVSNRAGAINATYKVDVIQRTRSKPVLTGTHPVNTTVDFGGTTSFQCKcorresponding to amino acids 1-269 of Q9H4D7 (SEQ ID NO:1700), whichalso corresponds to amino acids 1-269 of H53626_PEA_(—)1_P5 (SEQ IDNO:325), and a second amino acid sequence being at least 70%, optionallyat least 80%, preferably at least 85%, more preferably at least 90% andmost preferably at least 95% homologous to a polypeptide having thesequenceTQNRQGHLWPPRPRPLACRGPWSSASQPALSSSWAPCSCGFARPRRSRAPPRLPLPCLGTARRGRPATAAETRTFPRWPPSALALVWGCVRSMGLRQPPSTYWAQAQLLALSCTPNSTQTSTHTHTHTLTHTHTWRARSTSTSTISARRHRICSGHGGAGQTGRLGGWRTELQTKAGDPWRGGMASTPGSLCVRHSPWTHTHRHTHYLDACMHTHARTRAP (SEQ ID NO: 1776) corresponding to amino acids 270-490 ofH53626_PEA_(—)1_P5 (SEQ ID NO:325), wherein said first and second aminoacid sequences are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of H53626_PEA_(—)1_P5(SEQ ID NO:325), comprising a polypeptide being at least 70%, optionallyat least about 80%, preferably at least about 85%, more preferably atleast about 90% and most preferably at least about 95% homologous to thesequenceTQNRQGHLWPPRPRPLACRGPWSSASQPALSSSWAPCSCCFARPRRSRAPPRLPLPCLGTARRGRPATAAETRTFPRWPPSALALVWGCVRSMGLRQPPSTYWAQAQLLALSCTPNSTQTSTHTHTHTLTHTHTWRARSTSTSTISARRHRICSGHGGAGQTGRLGGWRTELQTKAGDPWRGGMASTPGSLCVRHSPWTHTHRHTHYLDACMHTHARTRAP (SEQ ID NO: 1776) in H53626_PEA_(—)1_P5 (SEQ ID NO:325).

Comparison report between H53626_PEA_(—)1_P5 (SEQ ID NO:325) and wildtype Q8N441 (SEQ ID NO:1699):

1. An isolated chimeric polypeptide encoding for H53626_PEA_(—)1_P5 (SEQID NO:325), comprising a first amino acid sequence being at least 90%homologous toMTPSPLLLLLLPPLLLGAFPPAAAARGPPKMADKVVPRQVARLGRTVRLQCPVEGDPPPLTMWTKDGRTIHSGWSRFRVLPQGLKVKQVEREDAGVYVCKATNGFGSLSVNYTLVVLDDISPGKESLGPDSSSGGQEDPASQQWARPRFTQPSKMRRRVIARPVGSSVRLKCVASGHPRPDITWMKDDQALTRPEAAEPRKKKWTLSLKNLRPEDSGKYTCRVSNRAGAINATYKVDVIQRTRSKPVLTGTHPVNTTVDFGGTTSFQCKcorresponding to amino acids 1-269 of Q8N441 (SEQ ID NO:1699), whichalso corresponds to amino acids 1-269 of H53626_PEA_(—)1_P5 (SEQ IDNO:325), and a second amino acid sequence being at least 70%, optionallyat least 80%, preferably at least 85%, more preferably at least 90% andmost preferably at least 95% homologous to a polypeptide having thesequenceTQNRQGHLWPPRPRPLACRGPWSSASQPALSSSWAPCSCGFARPRRSRAPPRLPLPCLGTARRGRPATAAETRTFPRWPPSALALVWGCVRSMGLRQPPSTYWAQAQLLALSCTPNSTQTSTHTHTHTLTHTHTWRARSTSTSTISARRHRICSGHGGAGQTGRLGGWRTELQTKAGDPWRGGMASTPGSLCVRHSPWTHTHRHTHYLDACMHTHARTRAP (SEQ ID NO: 1776) corresponding to amino acids 270-490 ofH53626_PEA_(—)1_P5 (SEQ ID NO:325), wherein said first and second aminoacid sequences are contiguous and in a sequential order.

2. An isolated polypeptide encoding for a tail of H53626_PEA_(—)1_P5(SEQ ID NO:325), comprising a polypeptide being at least 70%, optionallyat least about 80%, preferably at least about 85%, more preferably atleast about 90% and most preferably at least about 95% homologous to thesequenceTQNRQGHLWPPRPRPLACRGPWSSASQPALSSSWAPCSCGFARPRRSRAPPRLPLPCLGTARRGRPATAAETRTFPRWPPSALALVWGCVRSMGLRQPPSTYWAQAQLLALSCTPNSTQTSTHTHTHTLTHTHTWRARSTSTSTISARRHRICSGHGGAGQTGRLGGWRTELQTKAGDPWRGGMASTPGSLCVRHSPWTHTHRHTHYLDACMHTHARTRAP (SEQ ID NO: 1776) in H53626_PEA_(—)1_P5 (SEQ ID NO:325).

The location of the variant protein was determined according to resultsfrom a number of different software programs and analyses, includinganalyses from SignalP and other specialized programs. The variantprotein is believed to be located as follows with regard to the cell:secreted. The protein localization is believed to be secreted becauseboth signal-peptide prediction programs predict that this protein has asignal peptide, and neither trans-membrane region prediction programpredicts that this protein has a trans-membrane region.

Variant protein H53626_PEA_(—)1_P5 (SEQ ID NO:325) also has thefollowing non-silent SNPs (Single Nucleotide Polymorphisms) as listed inTable 1299 (given according to their position(s) on the amino acidsequence, with the alternative amino acid(s) listed; the last columnindicates whether the SNP is known or not; the presence of known SNPs invariant protein H53626_PEA_(—)1_P5 (SEQ ID NO:325) sequence providessupport for the deduced sequence of this variant protein according tothe present invention).

TABLE 1299 Amino acid mutations SNP position(s) on amino acidAlternative sequence amino acid(s) Previously known SNP? 193 R -> L Yes274 Q -> K Yes 336 A -> S Yes 353 A -> No 376 Q -> * Yes 405 R -> G No426 G -> No 476 Y -> C Yes

Variant protein H53626_PEA_(—)1_P5 (SEQ ID NO:325) is encoded by thefollowing transcript(s): H53626_PEA_(—)1_T16 (SEQ ID NO:17), for whichthe sequence(s) is/are given at the end of the application. The codingportion of transcript H53626_PEA_(—)1_T16 (SEQ ID NO:17) is shown inbold; this coding portion starts at position 17 and ends at position1489. The transcript also has the following SNPs as listed in Table 1300(given according to their position on the nucleotide sequence, with thealternative nucleic acid listed; the last column indicates whether theSNP is known or not; the presence of known SNPs in variant proteinH53626_PEA_(—)1_P5 (SEQ ID NO:325) sequence provides support for thededuced sequence of this variant protein according to the presentinvention).

TABLE 1300 Nucleic acid SNPs SNP position on nucleotide Alternativesequence nucleic acid Previously known SNP? 76 G -> A Yes 340 G -> T No1688 C -> T Yes 1803 C -> T Yes 1828 C -> G Yes 2143 C -> T Yes 2144 G-> A Yes 2260 C -> T Yes 2467 G -> A Yes 2522 C -> G Yes 594 G -> T Yes836 C -> A Yes 1022 G -> T Yes 1074 C -> No 1142 C -> T Yes 1229 A -> GNo 1292 G -> No 1443 A -> G Yes

As noted above, cluster H53626 features 20 segment(s), which were listedin Table 1293 above and for which the sequence(s) are given at the endof the application. These segment(s) are portions of nucleic acidsequence(s) which are described herein separately because they are ofparticular interest. A description of each segment according to thepresent invention is now provided.

Segment cluster H53626_PEA_(—)1_node_(—)15 (SEQ ID NO:18) according tothe present invention is supported by 25 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H53626_PEA_(—)1_T15 (SEQ ID NO:16)and H53626_PEA_(—)1_T16 (SEQ ID NO:17). Table 1301 below describes thestarting and ending position of this segment on each transcript.

TABLE 1301 Segment location on transcripts Segment Segment startingending Transcript name position position H53626_PEA_1_T15 (SEQ ID NO:16) 96 343 H53626_PEA_1_T16 (SEQ ID NO: 17) 96 343

Segment cluster H53626_PEA_(—)1_node_(—)22 (SEQ ID NO:19) according tothe present invention is supported by 42 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H53626_PEA_(—)1_T15 (SEQ ID NO:16)and H53626_PEA_(—)1_T16 (SEQ ID NO:17). Table 1302 below describes thestarting and ending position of this segment on each transcript.

TABLE 1302 Segment location on transcripts Segment Segment startingending Transcript name position position H53626_PEA_1_T15 (SEQ ID NO:16) 450 734 H53626_PEA_1_T16 (SEQ ID NO: 17) 450 734

Segment cluster H53626_PEA_(—)1_node_(—)25 (SEQ ID NO:306) according tothe present invention is supported by 41 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H53626_PEA_(—)1_T15 (SEQ IDNO:16). Table 1303 below describes the starting and ending position ofthis segment on each transcript.

TABLE 1303 Segment location on transcripts Segment Segment startingending Transcript name position position H53626_PEA_1_T15 (SEQ ID NO:16) 824 1088

Segment cluster H53626_PEA_(—)1_node_(—)26 (SEQ ID NO:307) according tothe present invention is supported by 5 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H53626_PEA_(—)1_T15 (SEQ IDNO:16). Table 1304 below describes the starting and ending position ofthis segment on each transcript.

TABLE 1304 Segment location on transcripts Segment Segment startingending Transcript name position position H53626_PEA_1_T15 (SEQ ID NO:16) 1089 1328

Segment cluster H53626_PEA_(—)1_node_(—)27 (SEQ ID NO:308) according tothe present invention is supported by 106 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H53626_PEA_(—)1_T15 (SEQ ID NO:16)and H53626_PEA_(—)1_T16 (SEQ ID NO:17). Table 1305 below describes thestarting and ending position of this segment on each transcript.

TABLE 1305 Segment location on transcripts Segment Segment startingending Transcript name position position H53626_PEA_1_T15 (SEQ ID NO:16) 1329 2228 H53626_PEA_1_T16 (SEQ ID NO: 17) 824 1723

Segment cluster H53626_PEA_(—)1_node_(—)34 (SEQ ID NO:309) according tothe present invention is supported by 121 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H53626_PEA_(—)1_T15 (SEQ ID NO:16)and H53626_PEA_(—)1_T16 (SEQ ID NO:17). Table 1306 below describes thestarting and ending position of this segment on each transcript.

TABLE 1306 Segment location on transcripts Segment Segment startingending Transcript name position position H53626_PEA_1_T15 (SEQ ID NO:16) 2507 2977 H53626_PEA_1_T16 (SEQ ID NO: 17) 2002 2472

Segment cluster H53626_PEA_(—)1_node_(—)35 (SEQ ID NO:310) according tothe present invention is supported by 85 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H53626_PEA_(—)1_T15 (SEQ ID NO:16)and H53626_PEA_(—)1_T16 (SEQ ID NO:17). Table 1307 below describes thestarting and ending position of this segment on each transcript.

TABLE 1307 Segment location on transcripts Segment Segment startingending Transcript name position position H53626_PEA_1_T15 (SEQ ID NO:16) 2978 3148 H53626_PEA_1_T16 (SEQ ID NO: 17) 2473 2643

Microarray (chip) data is also available for this segment as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotides were found to hit this segment, shown in Table 1308.

TABLE 1308 Oligonucleotides related to this segment Oligonucleotide nameOverexpressed in cancers Chip reference NA

Segment cluster H53626_PEA_(—)1_node_(—)36 (SEQ ID NO:311) according tothe present invention is supported by 69 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H53626_PEA_(—)1_T15 (SEQ ID NO:16)and H53626_PEA_(—)1_T16 (SEQ ID NO:17). Table 1309 below describes thestarting and ending position of this segment on each transcript.

TABLE 1309 Segment location on transcripts Segment Segment startingending Transcript name position position H53626_PEA_1_T15 (SEQ ID NO:16) 3149 3322 H53626_PEA_1_T16 (SEQ ID NO: 17) 2644 2817

Microarray (chip) data is also available for this segment as follows. Asdescribed above with regard to the cluster itself, variousoligonucleotides were tested for being differentially expressed invarious disease conditions, particularly cancer. The followingoligonucleotides were found to hit this segment, shown in Table 1310.

TABLE 1310 Oligonucleotides related to this segment Oligonucleotide nameOverexpressed in cancers Chip reference NA

According to an optional embodiment of the present invention, shortsegments related to the above cluster are also provided. These segmentsare up to about 120 by in length, and so are included in a separatedescription.

Segment cluster H53626_PEA_(—)1_node_(—)11 (SEQ ID NO:312) according tothe present invention is supported by 12 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H53626_PEA_(—)1_T15 (SEQ ID NO:16)and H53626_PEA_(—)1_T16 (SEQ ID NO:17). Table 1311 below describes thestarting and ending position of this segment on each transcript.

TABLE 1311 Segment location on transcripts Segment Segment startingending Transcript name position position H53626_PEA_1_T15 (SEQ ID NO:16) 1 55 H53626_PEA_1_T16 (SEQ ID NO: 17) 1 55

Segment cluster H53626_PEA_(—)1_node_(—)12 (SEQ ID NO:313) according tothe present invention is supported by 11 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H53626_PEA_(—)1_T15 (SEQ ID NO:16)and H53626_PEA_(—)1_T16 (SEQ ID NO:17). Table 1312 below describes thestarting and ending position of this segment on each transcript.

TABLE 1312 Segment location on transcripts Segment Segment startingending Transcript name position position H53626_PEA_1_T15 (SEQ ID NO:16) 56 95 H53626_PEA_1_T16 (SEQ ID NO: 17) 56 95

Segment cluster H53626_PEA_(—)1_node_(—)16 (SEQ ID NO:314) according tothe present invention can be found in the following transcript(s):H53626_PEA_(—)1_T15 (SEQ ID NO:16) and H53626_PEA_(—)1_T16 (SEQ IDNO:17). Table 1313 below describes the starting and ending position ofthis segment on each transcript.

TABLE 1313 Segment location on transcripts Segment Segment startingending Transcript name position position H53626_PEA_1_T15 (SEQ ID NO:16) 344 368 H53626_PEA_1_T16 (SEQ ID NO: 17) 344 368

Segment cluster H53626_PEA_(—)1_node_(—)19 (SEQ ID NO:315) according tothe present invention is supported by 25 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H53626_PEA_(—)1_T15 (SEQ ID NO:16)and H53626_PEA_(—)1_T16 (SEQ ID NO:17). Table 1314 below describes thestarting and ending position of this segment on each transcript.

TABLE 1314 Segment location on transcripts Segment Segment startingending Transcript name position position H53626_PEA_1_T15 (SEQ ID NO:16) 369 419 H53626_PEA_1_T16 (SEQ ID NO: 17) 369 419

Segment cluster H53626_PEA_(—)1_node_(—)20 (SEQ ID NO:316) according tothe present invention is supported by 27 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H53626_PEA_(—)1_T15 (SEQ ID NO:16)and H53626_PEA_(—)1_T16 (SEQ ID NO:17). Table 1315 below describes thestarting and ending position of this segment on each transcript.

TABLE 1315 Segment location on transcripts Segment Segment startingending Transcript name position position H53626_PEA_1_T15 (SEQ ID NO:16) 420 449 H53626_PEA_1_T16 (SEQ ID NO: 17) 420 449

Segment cluster H53626_PEA_(—)1_node_(—)24 (SEQ ID NO:317) according tothe present invention is supported by 34 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H53626_PEA_(—)1_T15 (SEQ ID NO:16)and H53626_PEA_(—)1_T16 (SEQ ID NO:17). Table 1316 below describes thestarting and ending position of this segment on each transcript.

TABLE 1316 Segment location on transcripts Segment Segment startingending Transcript name position position H53626_PEA_1_T15 (SEQ ID NO:16) 735 823 H53626_PEA_1_T16 (SEQ ID NO: 17) 735 823

Segment cluster H53626_PEA_(—)1_node_(—)28 (SEQ ID NO:318) according tothe present invention is supported by 66 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H53626_PEA_(—)1_T15 (SEQ ID NO:16)and H53626_PEA_(—)1_T16 (SEQ ID NO:17). Table 1317 below describes thestarting and ending position of this segment on each transcript.

TABLE 1317 Segment location on transcripts Segment Segment startingending Transcript name position position H53626_PEA_1_T15 (SEQ ID NO:16) 2229 2306 H53626_PEA_1_T16 (SEQ ID NO: 17) 1724 1801

Segment cluster H53626_PEA_(—)1_node_(—)29 (SEQ ID NO:319) according tothe present invention is supported by 73 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H53626_PEA_(—)1_T15 (SEQ ID NO:16)and H53626_PEA_(—)1_T16 (SEQ ID NO:17). Table 1318 below describes thestarting and ending position of this segment on each transcript.

TABLE 1318 Segment location on transcripts Segment Segment startingending Transcript name position position H53626_PEA_1_T15 (SEQ ID NO:16) 2307 2396 H53626_PEA_1_T16 (SEQ ID NO: 17) 1802 1891

Segment cluster H53626_PEA_(—)1_node_(—)30 (SEQ ID NO:320) according tothe present invention is supported by 71 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H53626_PEA_(—)1_T15 (SEQ ID NO:16)and H53626_PEA_(—)1_T16 (SEQ ID NO:17). Table 1319 below describes thestarting and ending position of this segment on each transcript.

TABLE 1319 Segment location on transcripts Segment Segment startingending Transcript name position position H53626_PEA_1_T15 (SEQ ID NO:16) 2397 2442 H53626_PEA_1_T16 (SEQ ID NO: 17) 1892 1937

Segment cluster H53626_PEA_(—)1_node_(—)31 (SEQ ID NO:321) according tothe present invention is supported by 67 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H53626_PEA_(—)1_T15 (SEQ ID NO:16)and H53626_PEA_(—)1_T16 (SEQ ID NO:17). Table 1320 below describes thestarting and ending position of this segment on each transcript.

TABLE 1320 Segment location on transcripts Segment Segment startingending Transcript name position position H53626_PEA_1_T15 (SEQ ID NO:16) 2443 2469 H53626_PEA_1_T16 (SEQ ID NO: 17) 1938 1964

Segment cluster H53626_PEA_(—)1_node_(—)32 (SEQ ID NO:322) according tothe present invention is supported by 65 libraries. The number oflibraries was determined as previously described. This segment can befound in the following transcript(s): H53626_PEA_(—)1_T15 (SEQ ID NO:16)and H53626_PEA_(—)1_T16 (SEQ ID NO:17). Table 1321 below describes thestarting and ending position of this segment on each transcript.

TABLE 1321 Segment location on transcripts Segment Segment startingending Transcript name position position H53626_PEA_1_T15 (SEQ ID NO:16) 2470 2498 H53626_PEA_1_T16 (SEQ ID NO: 17) 1965 1993

Segment cluster H53626_PEA_(—)1_node_(—)33 (SEQ ID NO:323) according tothe present invention can be found in the following transcript(s):H53626_PEA_(—)1_T15 (SEQ ID NO:16) and H53626_PEA_(—)1_T16 (SEQ IDNO:17). Table 1322 below describes the starting and ending position ofthis segment on each transcript.

TABLE 1322 Segment location on transcripts Segment Segment startingending Transcript name position position H53626_PEA_1_T15 (SEQ ID NO:16) 2499 2506 H53626_PEA_1_T16 (SEQ ID NO: 17) 1994 2001Variant protein alignment to the previously known protein:

Expression of Homo sapiens Fibroblast Growth Factor Receptor-Like 1(FGFRL1) H53626 Transcripts, which are Detectable by Amplicon asDepicted in Sequence Name H53626 junc24-27F1R3 (SEQ ID NO: 1690) inNormal and Cancerous Lung Tissues

Expression of Homo sapiens fibroblast growth factor receptor-like 1(FGFRL1) transcripts detectable by or according to junc24-27, H53626junc24-27F1R3 amplicon (SEQ ID NO: 1690) and H53626 junc24-27F1 (SEQ IDNO: 1688) and H53626 junc24-27R3 (SEQ ID NO: 1689) primers was measuredby real time PCR. In parallel the expression of four housekeepinggenes—PBGD (GenBank Accession No. BC019323 (SEQ ID NO:1713);amplicon—PBGD-amplicon, SEQ ID NO:334), HPRT1 (GenBank Accession No.NM_(—)000194 (SEQ ID NO:1714); amplicon—HPRT1-amplicon, SEQ ID NO:1297),UBC (GenBank Accession No. BC000449 (SEQ ID NO:1711);amplicon—Ubiquitin-amplicon, SEQ ID NO:328) and SDHA (GenBank AccessionNo. NM_(—)004168 (SEQ ID NO:1712); amplicon—SDHA-amplicon, SEQ IDNO:331), was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of thequantities of the housekeeping genes. The normalized quantity of each RTsample was then divided by the median of the quantities of the normalpost-mortem (PM) samples (Sample Nos. 47-50, 90-93, 96-99, Table 2,above), to obtain a value of fold up-regulation for each sample relativeto median of the normal PM samples.

FIG. 74 is a histogram showing over expression of the above-indicatedHomo sapiens fibroblast growth factor receptor-like 1 (FGFRL1)transcripts in cancerous lung samples relative to the normal samples.

As is evident from FIG. 74, the expression of Homo sapiens fibroblastgrowth factor receptor-like 1 (FGFRL1) transcripts detectable by theabove amplicon(s) was higher in several cancer samples than in thenon-cancerous samples (Sample Nos. 46-50, 90-93, 96-99 Table 2). Notablyan over-expression of at least 5 fold was found in 7 out of 15adenocarcinoma samples.

Primer pairs are also optionally and preferably encompassed within thepresent invention; for example, for the above experiment, the followingprimer pair was used as a non-limiting illustrative example only of asuitable primer pair: H53626 junc24-27F1 forward primer (SEQ ID NO:1688); and H53626 junc24-27R3 reverse primer (SEQ ID NO: 1689).

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: H53626 junc24-27F1R3(SEQ ID NO: 1690).

Forward primer (SEQ ID NO: 1688): GTCCTTCCAGTGCAAGACCCA Reverse primer(SEQ ID NO: 1689): TGGGCCTGGCAAAGCC Amplicon (SEQ ID NO: 1690):GTCCTTCCAGTGCAAGACCCAAAACCGCCAGGGCCACCTGTGGCCTCCTCGTCCTCGGCCACTAGCCTGCCGTGGCCCGTGGTCATCGGCATCCCAGCCGGCGCTGTCTTCATCCTGGGCACCCTGCTCCTGTGGCTTTGCCAGGCCCA

Expression of Homo sapiens Fibroblast Growth Factor Receptor-Like 1(FGFRL1) H53626 Transcripts, which are Detectable by Amplicon asDepicted in Sequence Name H53626 seg25 (SEQ ID NO: 1693) in Normal andCancerous Lung Tissues

Expression of Homo sapiens fibroblast growth factor receptor-like 1(FGFRL1) transcripts detectable by or according to seg25, H53626 seg25amplicon (SEQ ID NO: 1693) and H53626 seg25F (SEQ ID NO: 1691) andH53626 seg25R (SEQ ID NO: 1692) primers was measured by real time PCR.In parallel the expression of four housekeeping genes—PBGD (GenBankAccession No. BC019323 (SEQ ID NO:1713); amplicon—PBGD-amplicon, SEQ IDNO:334), HPRT1 (GenBank Accession No. NM_(—)000194 (SEQ ID NO:1714);amplicon—HPRT1-amplicon, SEQ ID NO:1297), UBC (GenBank Accession No.BC000449 (SEQ ID NO:1711); amplicon—Ubiquitin-amplicon, SEQ ID NO:328)and SDHA (GenBank Accession No. NM_(—)004168 (SEQ ID NO:1712);amplicon—SDHA-amplicon, SEQ ID NO:331), was measured similarly. For eachRT sample, the expression of the above amplicon was normalized to thegeometric mean of the quantities of the housekeeping genes. Thenormalized quantity of each RT sample was then divided by the median ofthe quantities of the normal post-mortem (PM) samples (Sample Nos.47-50, 90-93, 96-99, Table 2, above), to obtain a value of foldup-regulation for each sample relative to median of the normal PMsamples.

As is evident from FIG. 75, the expression of Homo sapiens fibroblastgrowth factor receptor-like 1 (FGFRL1) transcripts detectable by theabove amplicon(s) was higher in a few cancer samples than in thenon-cancerous samples (Sample Nos. 46-50, 90-93, 96-99 Table 2). Notablyan over-expression of at least 5 fold was found in 3 out of 15adenocarcinoma samples.

Primer pairs are also optionally and preferably encompassed within thepresent invention; for example, for the above experiment, the followingprimer pair was used as a non-limiting illustrative example only of asuitable primer pair: H53626 seg25F forward primer (SEQ ID NO: 1691);and H53626 seg25R reverse primer (SEQ ID NO: 1692).

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: H53626 seg25 (SEQ IDNO: 1693).

Forward primer (SEQ ID NO: 1691); CCGACGGCTCCTACCTCAA Reverse primer(SEQ ID NO: 1692): GGAAGCTGTAGCCCATGGTGT Amplicon (SEQ ID NO: 1693):CCGACGGCTCCTACCTCAATAAGCTGCTCATCACCCGTGCCCGCCAGGACGATGCGGGCATGTACATCTGCCTTGGCGCCAACACCATGGGCTACAGCTT CC

Expression of Homo sapiens Fibroblast Growth Factor Receptor-Like 1(FGFRL1) H53626 Transcripts, which are Detectable by Amplicon asDepicted in Sequence Name H53626 seg25 (SEQ ID NO: 1693) in DifferentNormal Tissues

Expression of Homo sapiens fibroblast growth factor receptor-like 1(FGFRL1) transcripts detectable by or according to H53626 seg25 amplicon(SEQ ID NO: 1693) and H53626 seg25F (SEQ ID NO: 1691) and H53626 seg25R(SEQ ID NO: 1692) was measured by real time PCR. In parallel theexpression of four housekeeping genes: RPL19 (GenBank Accession No.NM_(—)000981 (SEQ ID NO:1715); RPL19 amplicon, SEQ ID NO:1630), TATA box(GenBank Accession No. NM_(—)003194 (SEQ ID NO:1716); TATA amplicon, SEQID NO:1633), UBC (GenBank Accession No. BC000449 (SEQ ID NO:1711);amplicon—Ubiquitin-amplicon, SEQ ID NO:328) and SDHA (GenBank AccessionNo. NM_(—)004168 (SEQ ID NO:1712); amplicon—SDHA-amplicon, SEQ IDNO:331) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of thequantities of the housekeeping genes. The normalized quantity of each RTsample was then divided by the median of the quantities of the lungsamples (Sample Nos. 15-17 Table 3 above), to obtain a value of relativeexpression of each sample relative to median of the lung samples.

Forward primer (SEQ ID NO: 1691); CCGACGGCTCCTACCTCAA Reverse primer(SEQ ID NO: 1692): GGAAGCTGTAGCCCATGGTGT Amplicon (SEQ ID NO: 1693):CCGACGGCTCCTACCTCAATAAGCTGCTCATCACCCGTGCCCGCCAGGACGATGCGGGCATGTACATCTGCCTTGGCGCCAACACCATGGGCTACAGCTT CC

The results are demonstrated in FIG. 77, showing the expression of ofHomo sapiens fibroblast growth factor receptor-like 1 (FGFRL1) H53626transcripts, which are detectable by amplicon as depicted in sequencename H53626 seg25 (SEQ ID NO: 1693) in different normal tissues.

Expression of Homo sapiens Fibroblast Growth Factor Receptor-like 1(FGFRL1) H53626 Transcripts which are Detectable by Amplicon as Depictedin Sequence Name H53626 junc24-27F1R3 (SEQ ID NO: 1690) in DifferentNormal Tissues

Expression of Homo sapiens fibroblast growth factor receptor-like 1(FGFRL1) transcripts detectable by or according to H53626 junc24-27F1R3amplicon (SEQ ID NO: 1690) and H53626 junc24-27F1 (SEQ ID NO: 1688) andH53626 junc24-27R3 (SEQ ID NO: 1689) was measured by real time PCR. Inparallel the expression of four housekeeping genes—RPL19 (GenBankAccession No. NM_(—)000981 (SEQ ID NO:1715); RPL19 amplicon, SEQ IDNO:1630), TATA box (GenBank Accession No. NM_(—)003194 (SEQ ID NO:1716);TATA amplicon, SEQ ID NO:1633; primers SEQ ID NOs 1631 and 1632), UBC(GenBank Accession No. BC000449 (SEQ ID NO:1711);amplicon—Ubiquitin-amplicon, SEQ ID NO:328) and SDHA (GenBank AccessionNo. NM_(—)004168 (SEQ ID NO:1712); amplicon—SDHA-amplicon, SEQ IDNO:331) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of thequantities of the housekeeping genes. The normalized quantity of each RTsample was then divided by the median of the quantities of the lungsamples (Sample Nos. 15-17 Table 3 above), to obtain a value of relativeexpression of each sample relative to median of the lung samples.

Forward primer (SEQ ID NO: 1688): GTCCTTCCAGTGCAAGACCCA Reverse primer(SEQ ID NO: 1689): TGGGCCTGGCAAAGCC Amplicon (SEQ ID NO: 1690):GTCCTTCCAGTGCAAGACCCAAAACCGCCAGGGCCACCTGTGGCCTCCTCGTCCTCGGCCACTAGCCTGCCGTGGCCCGTGGTCATCGGCATCCCAGCCGGCGCTGTCTTCATCCTGGGCACCCTGCTCCTGTGGCTTTGCCAGGCCCA

The results are demonstrated in FIG. 78, showing the expression of Homosapiens fibroblast growth factor receptor-like 1 (FGFRL1) H53626transcripts, which are detectable by amplicon as depicted in sequencename H53626 junc24-27F1R3 (SEQ ID NO: 1690) in different normal tissues.

Expression of Trophinin Associated Protein (Tastin) [T86235] Transcriptswhich are Detectable by Amplicon as Depicted in SEQ ID NO:1480 in Normaland Cancerous Lung Tissues

Expression of trophinin associated protein (tastin) transcriptsdetectable by SEQ ID NO:1480 (e.g., variant no. 23-26 31, 32—representedby SEQ IDs 1485-1488, 1609, 1610) was measured by real time PCR. Inparallel the expression of four housekeeping genes—PBGD (GenBankAccession No. BC019323 (SEQ ID NO:1713); amplicon—SEQ ID NO:1471), HPRT1(GenBank Accession No. NM_(—)000194 (SEQ ID NO:1714); amplicon—SEQ IDNO:1468), Ubiquitin (GenBank Accession No. BC000449 (SEQ ID NO:1711);amplicon—SEQ ID NO:1474) and SDHA (GenBank Accession No. NM_(—)004168(SEQ ID NO:1712); amplicon—SEQ ID NO:1477), was measured similarly. Foreach RT sample, the expression of SEQ ID NO:1480 was normalized to thegeometric mean of the quantities of the housekeeping genes. Thenormalized quantity of each RT sample was then divided by the median ofthe quantities of the normal post-mortem (PM) samples (Sample Nos.47-50, 90-93, 96-99, Table 2, “Tissue samples in testing panel”, above),to obtain a value of fold up-regulation for each sample relative tomedian of the normal PM samples.

FIG. 54 a is a histogram showing over expression of the above-indicatedtrophinin associated protein (tastin) transcripts in cancerous lungsamples relative to the normal samples. The number and percentage ofsamples that exhibit at least 5 fold over-expression, out of the totalnumber of samples tested is indicated in the bottom.

As is evident from FIG. 54 a, the expression of trophinin associatedprotein (tastin) transcripts detectable by SEQ ID NO:1480 in cancersamples was significantly higher than in the non-cancerous samples(Sample Nos. 46-50, 90-93, 96-99 Table 2, “Tissue samples in testingpanel”). Notably an over-expression of at least 5 fold was found in 6out of 15 adenocarcinoma samples, 8 out of 16 squamous cell carcinomasamples, 2 out of 4 large cell carcinoma samples and in 8 out of 8 smallcells carcinoma samples.

Statistical analysis was applied to verify the significance of theseresults, as described below.

The P value for the difference in the expression levels of trophininassociated protein (tastin) transcripts detectable by SEQ ID NO:1480 inlung cancer samples versus the normal lung samples was determined by Ttest as 1.61E-04.

Threshold of 5 fold overexpression was found to differentiate betweencancer and normal samples with P value of 1.49E-02 as checked by exactfisher test. The above values demonstrate statistical significance ofthe results.

According to the present invention, trophinin associated protein(tastin) is a non-limiting example of a marker for diagnosing lungcancer. The trophinin associated protein (tastin) marker of the presentinvention, can be used alone or in combination, for various uses,including but not limited to, prognosis, prediction, screening, earlydiagnosis, therapy selection and treatment monitoring of lung cancer.Although optionally any method may be used to detected overexpressionand/or differential expression of this marker, preferably a NAT-basedtechnology is used. Therefore, optionally and preferably, any nucleicacid molecule capable of selectively hybridizing to trophinin associatedprotein (tastin) as previously defined is also encompassed within thepresent invention. Primer pairs are also optionally and preferablyencompassed within the present invention; for example, for the aboveexperiment, the following primer pair was used as a non-limitingillustrative example only of a suitable primer pair: trophininassociated protein (tastin)-TAA-seg 44-forward primer (SEQ ID NO: 1478):

AGACTCCAACCCACAGCCC; and trophinin associated protein (tastin)—TAA-seg44-Reverse primer (SEQ ID NO: 1479): CAGCTCAGCCAACCTTGCA.

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: trophinin associatedprotein (tastin) amplicon, SEQ ID NO: 1480:AGACTCCAACCCACAGCCCAGCTGTGGCTGCACAGTGAGCCTGATGGGAGGTGGGGAACAGGGACAGGGGGCCACCTGGGCTTCTTCACAGAGAGGTCAGCAGGAAGGCTTGGCTACAGTGCAAGGTTGGCTGAGCTG

According to other preferred embodiments of the present invention,trophinin associated protein (tastin) or a fragment thereof comprises abiomarker for detecting lung cancer. Optionally and more preferably,trophinin associated protein (tastin) splice variants, as depicted inSEQ ID NO: 1485-1488, 1609, 1610 (e.g., variant no. 23-26, 31, 32), or afragment thereof comprise a biomarker for detecting lung cancer.Optionally and more preferably, the fragment of trophinin associatedprotein (tastin) comprises segment_TAA-44—SEQ ID NO: 1507. Alsooptionally and more preferably, any suitable method may be used fordetecting a fragment such as trophinin associated protein (tastin)_segment_ TAA-44—SEQ ID no 1507 for example. Most preferably, NAT-basedtechnology used, such as any nucleic acid molecule capable ofspecifically hybridizing with the fragment. Optionally and mostpreferably, a primer pair is used for obtaining the fragment.

According to still other preferred embodiments, the present inventionoptionally and preferably encompasses any amino acid sequence orfragment thereof encoded by a nucleic acid sequence corresponding totrophinin associated protein (tastin) as described above, including butnot limited to SEQ ID NOs: 1492-1501, 1612. Any oligopeptide or peptiderelating to such an amino acid sequence or fragment thereof mayoptionally also (additionally or alternatively) be used as a biomarker,including but not limited to the unique amino acid sequences of theseproteins that are depicted in SEQ ID Nos: 1508-1511, 1613. The presentinvention also optionally encompasses antibodies capable of recognizing,and/or being elicited by, such oligopeptides or peptides.

The present invention also optionally and preferably encompasses anynucleic acid sequence or fragment thereof, or amino acid sequence orfragment thereof, corresponding to trophinin associated protein (tastin)as described above, optionally for any application.

Expression of Trophinin Associated Protein (Tastin) [T86235] Transcriptswhich are Detectable by Oligonucleotides as Depicted in SEQ IDNOs:1512-1514 in Normal and Cancerous Lung Tissues

Expression of trophinin associated protein (tastin) [T86235] transcriptsdetectable by oligonucleotides SEQ ID NOs: 1512-1514 (e.g., variants no.8-10, 22, 23, 26, 27, 29-31, 33—represented by SEQ IDs 1481-1485,1488-1491, 1609, 1611) was measured with oligonucleotide-basedmicro-arrays. The segments detected by the above oligonucleotides asdepicted in SEQ ID NOs: 1512-1514 are for example nucleotide sequencesas depicted in SEQ IDs 1503, 1504, 1506.

The results of image intensities for each feature were normalizedaccording to the ninetieth percentile of the image intensities of allthe features on the chip. Then, feature image intensities for replicatesof the same oligonucleotide on the chip and replicates of the samesample were averaged. Outlying results were discarded. For everyoligonucleotide (SEQ ID NOs: 1512-1514) the averaged intensitydetermined for every sample was divided by the averaged intensity of allthe normal samples (Sample Nos. 48,50, 90-92, 96-99, Table 2, “Tissuesamples in testing panel”, above), to obtain a value of foldup-regulation for each sample relative to the averaged normal samples.These data are presented in a histogram in FIG. 54 b. As is evident fromFIG. 54 b, the expression of trophinin associated protein (tastin)[T86235] transcripts detectable with oligonucleotides according to SEQID NOs: 1512-1514 in cancer samples was significantly higher than in thenormal samples.

According to the present invention, trophinin associated protein(tastin) is a non-limiting example of a marker for diagnosing lungcancer. Although optionally any method may be used to detectedoverexpression and/or differential expression of this marker, preferablya NAT-based technology is used. Therefore, optionally and preferably,any nucleic acid molecule capable of selectively hybridizing totrophinin associated protein (tastin) as previously defined is alsoencompassed within the present invention. Oligonucleotides are alsooptionally and preferably encompassed within the present invention; forexample, for the above experiment, the following oligonucleotides wereused as a non-limiting illustrative example only of a suitableoligonucleotides: SEQ ID NOs: 1512-1514

(SEQ ID NO: 1512) CATGGTAACACGGCCTCCATGGCTGAGTAGGGGACTAGGAAGGGTAAAAG(SEQ ID NO: 1513) TGTACATCTAGGGCCTCTCAGTTAGGGGCTTCAATCCATTCCTCATGAGG(SEQ ID NO: 1514) TGTGAACACAAGAGGTCCTCACCTCACTGTGAGCTGCACACCTGCCCTGC

According to other preferred embodiments of the present invention,trophinin associated protein (tastin) or a fragment thereof comprises abiomarker for detecting lung cancer. Optionally and more preferably,trophinin associated protein (tastin) splice variants, as depicted inSEQ ID NO: 1481-1485, 1488-1491, 1609, 1611 (e.g., variant no. 8-10, 22,23, 26, 27, 29-31, 33), or a fragment thereof comprise a biomarker fordetecting lung cancer. Optionally and more preferably, the fragment oftrophinin associated protein (tastin) comprises segment_TAA-14, 35 and42—SEQ ID no. 1503, 1504, 1506. Also optionally and more preferably, anysuitable method may be used for detecting a fragment such as trophininassociated protein (tastin) _segment_TAA-14, 35 and 42—SEQ ID NOs 1503,1504 and 1506 for example. Most preferably, NAT-based technology used,such as any nucleic acid molecule capable of specifically hybridizingwith the fragment. Optionally and most preferably, a primer pair is usedfor obtaining the fragment.

According to other preferred embodiments of the present invention,trophinin associated protein (tastin) splice variants containing theunique segments as depicted in SEQ ID Nos 1502 and 1505, for example asthese included in variants 9 and 29 (SEQ ID NOs: 1482 and 1490,respectively), are useful as biomarkers for detecting lung cancer.

The present invention also optionally and preferably encompasses anynucleic acid sequence or fragment thereof, or amino acid sequence orfragment thereof, corresponding to trophinin associated protein (tastin)as described above, optionally for any application.

Expression of Homeo Box C10 (HOXC10) [N31842] Transcripts which areDetectable by Amplicon as Depicted in SEQ ID NO:1517 in Normal andCancerous Lung Tissues

Expression of Homeo box C10 (HOXC10) transcripts detectable by SEQ IDNO: 1517 (e.g., variant no. 3, represented by SEQ ID 1519) was measuredby real time PCR. In parallel the expression of four housekeepinggenes—PBGD (GenBank Accession No. BC019323 (SEQ ID NO:1713);amplicon—SEQ ID NO:1471), HPRT1 (GenBank Accession No. NM_(—)000194 (SEQID NO:1714); amplicon—SEQ ID NO:3), Ubiquitin (GenBank Accession No.BC000449 (SEQ ID NO:1711); amplicon—SEQ ID NO:9) and SDHA (GenBankAccession No. NM_(—)004168 (SEQ ID NO:1712); amplicon—SEQ ID NO:1477),was measured similarly. For each RT sample, the expression of SEQ IDNO:1517 was normalized to the geometric mean of the quantities of thehousekeeping genes. The normalized quantity of each RT sample was thendivided by the median of the quantities of the normal post-mortem (PM)samples (Sample Nos. 47-50, 90-93, 96-99, Table 2, “Tissue samples intesting panel”, above), to obtain a value of fold up-regulation for eachsample relative to median of the normal PM samples.

FIG. 55 is a histogram showing over expression of the above-indicatedHomeo box C10 (HOXC10) transcripts in cancerous lung samples relative tothe normal samples. The number and percentage of samples that exhibit atleast 20 fold over-expression, out of the total number of samples testedis indicated in the bottom.

As is evident from FIG. 55, the expression of Homeo box C10 (HOXC10)transcripts detectable by SEQ ID NO: 1517 in cancer samples wassignificantly higher than in the non-cancerous samples (Sample Nos.46-50, 90-93, 96-99, Table 2, “Tissue samples in testing panel”).Notably an over-expression of at least 20 fold was found in 6 out of 15adenocarcinoma samples, 9 out of 16 squamous cell carcinoma samples, andin 3 out of 4 large cell carcinoma samples.

Statistical analysis was applied to verify the significance of theseresults, as described below. The P value for the difference in theexpression levels of Homeo box C10 (HOXC10) transcripts detectable bySEQ ID NO: 1517 in lung cancer samples versus the normal lung sampleswas determined by T test as 4.43E-03. Threshold of 20 foldoverexpression was found to differentiate between cancer and normalsamples with P value of 2.88E-02 as checked by exact fisher test. Theabove values demonstrate statistical significance of the results.

According to the present invention, Homeo box C10 (HOXC10) is anon-limiting example of a marker for diagnosing lung cancer. The Homeobox C10 (HOXC10) marker of the present invention, can be used alone orin combination, for various uses, including but not limited to,prognosis, prediction, screening, early diagnosis, therapy selection andtreatment monitoring of lung cancer. Although optionally any method maybe used to detected overexpression and/or differential expression ofthis marker, preferably a NAT-based technology is used. Therefore,optionally and preferably, any nucleic acid molecule capable ofselectively hybridizing to Homeo box C10 (HOXC10) as previously definedis also encompassed within the present invention. Primer pairs are alsooptionally and preferably encompassed within the present invention; forexample, for the above experiment, the following primer pair was used asa non-limiting illustrative example only of a suitable primer pair:Homeo box C10 (HOXC10)-forward primer (SEQ ID NO: 1515):GCGAAACGCGATTTGTTGTT; and Homeo box C10 (HOXC10)-Reverse primer (SEQ IDNO:1516): CATCTGGAGGAGGGAGGGA.

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: Homeo box C10 (HOXC10)amplicon (SEQ ID NO:1517):GCGAAACGCGATTTGTTGTTTGTGGGTCTGATTTGTGCGTGCGGCTTGGGCTCCTGCGGCTTTTGGCTCGGCCGGGGGCCTTGGGCAGCGAGGCTGGAGCCGGAAGAGGTGGAGGTGAAGGGCTGCCCGCCACGTCCCTCCCTCCCTCCAGATG.

According to other preferred embodiments of the present invention, Homeobox C10 (HOXC10) or a fragment thereof comprises a biomarker fordetecting lung cancer. Optionally and more preferably, Homeo box C10(HOXC10) splice variants, as depicted in SEQ ID NO:54 (e.g., variant no.3), or a fragment thereof comprise a biomarker for detecting lungcancer. Optionally and more preferably, the fragment of Homeo box C10(HOXC10) comprises segment_TAA-seg 6 (SEQ ID NO: 1526). Also optionallyand more preferably, any suitable method may be used for detecting afragment such as Homeo box C10 (HOXC10) _segment_ TAA-seg 6 (SEQ IDNO:1526) for example. Most preferably, NAT-based technology used, suchas any nucleic acid molecule capable of specifically hybridizing withthe fragment. Optionally and most preferably, a primer pair is used forobtaining the fragment.

According to other preferred embodiments of the present invention, Homeobox C10 (HOXC10) splice variants containing the unique segments asdepicted in SEQ ID NOs: 1524 and 1525, for example transcripts asdepicted in SEQ ID NO: 1515, 1519 and 1520, comprise a biomarker fordetecting lung cancer.

According to still other preferred embodiments, the present inventionoptionally and preferably encompasses any amino acid sequence orfragment thereof encoded by a nucleic acid sequence corresponding totrophinin associated protein (tastin) as described above, including butnot limited to SEQ ID NOs: 1521 and 1522. Any oligopeptide or peptiderelating to such an amino acid sequence or fragment thereof mayoptionally also (additionally or alternatively) be used as a biomarker,including but not limited to the unique amino acid sequence of theprotein SEQ ID NO: 1522, as depicted in SEQ ID NO:1523. The presentinvention also optionally encompasses antibodies capable of recognizing,and/or being elicited by, such oligopeptides or peptides.

The present invention also optionally and preferably encompasses anynucleic acid sequence or fragment thereof, or amino acid sequence orfragment thereof, corresponding to trophinin associated protein (tastin)as described above, optionally for any application.

Expression of Nucleolar Protein 4 (NOL4)-[T06014] Transcripts which areDetectable by Amplicon as Depicted in SEQ IDs NO:1529 in Normal andCancerous Lung Tissues

Expression of Nucleolar protein 4 (NOL4) transcripts detectable by SEQID NOs:1529 (e.g., variant no. 3, 11 and 12, represented by SEQ IDs1533, 1537, 1538) was measured by real time PCR. In parallel theexpression of four housekeeping genes—PBGD (GenBank Accession No.BC019323 (SEQ ID NO:1713); amplicon—SEQ ID NO:1471), HPRT1 (GenBankAccession No. NM_(—)000194 (SEQ ID NO:1714); amplicon—SEQ ID NO:1468),Ubiquitin (GenBank Accession No. BC000449 (SEQ ID NO:1711); amplicon—SEQID NO:1474) and SDHA (GenBank Accession No. NM_(—)004168 (SEQ IDNO:1712); amplicon—SEQ ID NO:1477), was measured similarly. For each RTsample, the expression of SEQ ID NO:1529 was normalized to the geometricmean of the quantities of the housekeeping genes. The normalizedquantity of each RT sample was then divided by the median of thequantities of the normal post-mortem (PM) samples (Sample Nos. 47-50,90-93, 96-99, Table 2, above, “Tissue samples in testing panel”), toobtain a value of fold up-regulation for each sample relative to medianof the normal PM samples.

FIGS. 56 a and b are histograms showing over expression of theabove-indicated Nucleolar protein 4 (NOL4) transcripts in cancerous lungsamples relative to the normal samples. The number and percentage ofsamples that exhibit at least 200 fold or 6 fold over-expression, out ofthe total number of samples tested is indicated in the bottom of FIGS.56 a and 56 b respectively.

As is evident from FIG. 56 a, the expression of Nucleolar protein 4(NOL4) transcripts detectable by SEQ ID NO: 1529 in the samplesoriginate from small cell carcinoma of the lung was significantly higherthan in the non-cancerous samples (Sample Nos. 46-50, 90-93, 96-99,Table 2, “Tissue samples in testing panel”). Notably an over-expressionof at least 200 fold was found in 8 out of 8 small cell carcinomasamples. As is evident from FIG. 56 b, over expression of at least 6fold was observed also in 2 out of 15 adenocarcinoma samples, 3 out of16 squamous cell carcinoma samples.

Statistical analysis was applied to verify the significance of theseresults, as described below.

The P value for the difference in the expression levels of Nucleolarprotein 4 (NOL4) transcripts detectable by SEQ ID NO:1529 in lung cancersamples versus the normal lung samples was determined by T test as1.36E-02.

Threshold of 6 fold overexpression was found to differentiate betweencancer and normal samples with P value of 2.52E-02 as checked by exactfisher test.

The P value for the difference in the expression levels of Nucleolarprotein 4 (NOL4) transcripts detectable by SEQ ID NO:1529 in lung smallcell carcinoma samples versus the normal lung samples was determined byT test as 3.86E-03.

Threshold of 200 fold overexpression was found to differentiate betweensmall cell carcinoma and normal lung samples with P value of 7.94E-06 aschecked by exact fisher test.The above values demonstrate statistical significance of the results.

According to the present invention, Nucleolar protein 4 (NOL4) is anon-limiting example of a marker for diagnosing lung cancer. TheNucleolar protein 4 (NOL4) marker of the present invention, can be usedalone or in combination, for various uses, including but not limited to,prognosis, prediction, screening, early diagnosis, therapy selection andtreatment monitoring of lung cancer. Although optionally any method maybe used to detected overexpression and/or differential expression ofthis marker, preferably a NAT-based technology is used. Therefore,optionally and preferably, any nucleic acid molecule capable ofselectively hybridizing to Nucleolar protein 4 (NOL4) as previouslydefined is also encompassed within the present invention. Primer pairsare also optionally and preferably encompassed within the presentinvention; for example, for the above experiment, the following primerpair was used as a non-limiting illustrative example only of a suitableprimer pair: Nucleolar protein 4 (NOL4)-TAA-seg1-forward primer (SEQ IDNO:1527): CTCGCTCCCTTGCTCACAC; and Nucleolar protein 4(NOL4)-TAA-seg1-Reverse primer (SEQ ID NO:1528): AAAGGGAAAGCGGGATGTTT.

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: Nucleolar protein 4(NOL4) amplicon (SEQ ID NO:1529):CTCGCTCCCTTGCTCACACACACGCACACACTCAGCCTGGCCGAGCAGGAGCCACTGACCATTTTGCAAGTGTCAGGACCAGCTACAGCGCGGTGGGCGCAAACATCCCGCTTTCCCTTT.

According to other preferred embodiments of the present invention,Nucleolar protein 4 (NOL4) or a fragment thereof comprises a biomarkerfor detecting lung cancer. Optionally and more preferably, Nucleolarprotein 4 (NOL4) splice variants, as depicted in SEQ ID NO:1529 (e.g.,variants nos. 3, 11 and 12), or a fragment thereof comprise a biomarkerfor detecting lung cancer. Optionally and more preferably, the fragmentof Nucleolar protein 4 (NOL4) comprises segment_TAA-seg-1 (SEQ ID NO:1552). Also optionally and more preferably, any suitable method may beused for detecting a fragment such as Nucleolar protein 4(NOL4)_segment_ TAA-seg-1 (SEQ ID NO: 1552) for example. Mostpreferably, NAT-based technology used, such as any nucleic acid moleculecapable of specifically hybridizing with the fragment. Optionally andmost preferably, a primer pair is used for obtaining the fragment.

According to other preferred embodiments of the present invention,Nucleolar protein 4 (NOL4) splice variants containing the uniquesegments as depicted in SEQ ID NOs: 1554 and 1555, for exampletranscripts as depicted in SEQ ID NOs: 1534-1536 and 1539-1541,comprises a biomarker for detecting lung cancer.

According to still other preferred embodiments, the present inventionoptionally and preferably encompasses any amino acid sequence orfragment thereof encoded by a nucleic acid sequence corresponding toNucleolar protein 4 (NOL4) as described above, including but not limitedto SEQ ID Nos: 1542, 1547 and 1543; 1548, 1545, 1546, and 1549-1551. Anyoligopeptide or peptide relating to such an amino acid sequence orfragment thereof may optionally also (additionally or alternatively) beused as a biomarker, including but not limited to the unique amino acidsequence of the protein SEQ ID NO: 1543, 1546, 1549 as depicted in SEQID NO:1544.

The present invention also optionally encompasses antibodies capable ofrecognizing, and/or being elicited by, such oligopeptides or peptides.

The present invention also optionally and preferably encompasses anynucleic acid sequence or fragment thereof, or amino acid sequence orfragment thereof, corresponding to Nucleolar protein 4 (NOL4) asdescribed above, optionally for any application.

Expression of Nucleolar Protein 4 (NOL4)-[T06014] Transcripts which areDetectable by Amplicon as Depicted in SEQ IDs NO:1532 in Normal andCancerous Lung Tissues

Expression of Nucleolar protein 4 (NOL4) transcripts detectable by SEQID NOs:1532 (e.g., variant no. 3, 11 and 12, represented by SEQ IDs1533, 1537, 1538) was measured by real time PCR. In parallel theexpression of four housekeeping genes—PBGD (GenBank Accession No.BC019323 (SEQ ID NO:1713); amplicon—SEQ ID NO:1471), HPRT1 (GenBankAccession No. NM_(—)000194 (SEQ ID NO:1714); amplicon—SEQ ID NO:1468),Ubiquitin (GenBank Accession No. BC000449 (SEQ ID NO:1711); amplicon—SEQID NO:1474) and SDHA (GenBank Accession No. NM_(—)004168 (SEQ IDNO:1712); amplicon—SEQ ID NO: 1481), was measured similarly. For each RTsample, the expression of SEQ ID NO:1532 was normalized to the geometricmean of the quantities of the housekeeping genes. The normalizedquantity of each RT sample was then divided by the median of thequantities of the normal post-mortem (PM) samples (Sample Nos. 47-50,90-93, 96-99, Table 2, “Tissue samples in testing panel”, above), toobtain a value of fold up-regulation for each sample relative to medianof the normal PM samples.

FIGS. 57 a and b are histograms showing over expression of theabove-indicated Nucleolar protein 4 (NOL4) transcripts in cancerous lungsamples relative to the normal samples. The number and percentage ofsamples that exhibit at least 400 fold or 6 fold over-expression, out ofthe total number of samples tested is indicated in the bottom of FIGS.57 a and b respectively.

As is evident from FIG. 57 a, the expression of Nucleolar protein 4(NOL4) transcripts detectable by SEQ ID NO:1532 in the samples originatefrom small cell carcinoma of the lung was significantly higher than inthe non-cancerous samples (Sample Nos. 46-50, 90-93, 96-99, Table 2,“Tissue samples in testing panel”). Notably an over-expression of atleast 400 fold was found in 8 out of 8 small cell carcinoma samples. Asis evident from FIG. 4 b, over expression of at least 6 fold wasobserved also in 4 out of 15 adenocarcinoma samples, 3 out of 16squamous cell carcinoma samples.

Statistical analysis was applied to verify the significance of theseresults, as described below.

The P value for the difference in the expression levels of Nucleolarprotein 4 (NOL4) transcripts detectable by SEQ ID NO:1532 in lung cancersamples versus the normal lung samples was determined by T test as1.70E-02.

Threshold of 6 fold overexpression was found to differentiate betweencancer and normal samples with P value of 1.80E-02 as checked by exactfisher test.

The P value for the difference in the expression levels of Nucleolarprotein 4 (NOL4) transcripts detectable by SEQ ID NO:1532 in lung smallcell carcinoma samples versus the normal lung samples was determined byT test as 7.08E-03.

Threshold of 400 fold overexpression was found to differentiate betweensmall cell carcinoma and normal lung samples with P value of 1.03E-04 aschecked by exact fisher test.The above values demonstrate statistical significance of the results.

According to the present invention, Nucleolar protein 4 (NOL4) is anon-limiting example of a marker for diagnosing lung cancer. TheNucleolar protein 4 (NOL4) marker of the present invention, can be usedalone or in combination, for various uses, including but not limited to,prognosis, prediction, screening, early diagnosis, therapy selection andtreatment monitoring of lung cancer. Although optionally any method maybe used to detected overexpression and/or differential expression ofthis marker, preferably a NAT-based technology is used. Therefore,optionally and preferably, any nucleic acid molecule capable ofselectively hybridizing to Nucleolar protein 4 (NOL4) as previouslydefined is also encompassed within the present invention. Primer pairsare also optionally and preferably encompassed within the presentinvention; for example, for the above experiment, the following primerpair was used as a non-limiting illustrative example only of a suitableprimer pair: Nucleolar protein 4 (NOL4)-TAA-seg 3-forward primer (SEQ IDNO: 1530): ACATCCCCCTGGAACGGAT; and Nucleolar protein 4 (NOL4)-TAA-seg3-Reverse primer (SEQ ID NO:1531): CAGAAATTAGCAAAGCATTGATGG.

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: Nucleolar protein 4(NOL4) amplicon (SEQ ID NO: 1532):ACATCCCCCTGGAACGGATATCTGTTTGGGGCACTACAATCTATCCTGTAGAACTATGGCCAAATCTCCATCAATGCTTTGCTAATTTCTG.

According to other preferred embodiments of the present invention,Nucleolar protein 4 (NOL4) or a fragment thereof comprises a biomarkerfor detecting lung cancer. Optionally and more preferably, Nucleolarprotein 4 (NOL4) splice variants, as depicted in SEQ ID NO:1533, 1537,1538 (e.g., variants nos. 3, 11, 12), or a fragment thereof comprise abiomarker for detecting lung cancer. Optionally and more preferably, thefragment of Nucleolar protein 4 (NOL4) comprises segment_TAA-seg-3 (SEQID NO: 1553). Also optionally and more preferably, any suitable methodmay be used for detecting a fragment such as Nucleolar protein 4(NOL4)_segment_ TAA-seg-3 (SEQ ID NO: 1553) for example. Mostpreferably, NAT-based technology used, such as any nucleic acid moleculecapable of specifically hybridizing with the fragment. Optionally andmost preferably, a primer pair is used for obtaining the fragment.

According to still other preferred embodiments, the present inventionoptionally and preferably encompasses any amino acid sequence orfragment thereof encoded by a nucleic acid sequence corresponding toNucleolar protein 4 (NOL4) as described above, including but not limitedto SEQ ID NOs: SEQ ID Nos: 1542, 1547 and 1548. Any oligopeptide orpeptide relating to such an amino acid sequence or fragment thereof mayoptionally also (additionally or alternatively) be used as a biomarker.

The present invention also optionally encompasses antibodies capable ofrecognizing, and/or being elicited by, such oligopeptides or peptides.

The present invention also optionally and preferably encompasses anynucleic acid sequence or fragment thereof, or amino acid sequence orfragment thereof corresponding to Nucleolar protein 4 (NOL4) asdescribed above, optionally for any application.

Expression of AA281370 Transcripts which are Detectable by Amplicon asDepicted in SEQ ID NO:1558 in Normal and Cancerous Lung Tissues

AA281370 gene was identified by a computational process described aboveas over expressed in lung cancer. The AA281370 encoded proteins (SEQ IDNO: 1563, 1564) contain several WD40 domains, which are found in anumber of eukaryotic proteins that cover a wide variety of functions,including adaptor/regulatory modules in signal transduction, pre-mRNAprocessing and cytoskeleton assembly. As is demonstrated in FIG. 63, theWD40 domain region of AA281370 encoded protein, depicted in SEQ ID NO:1564, has several similarities that might suggest involvement in signaltransduction MAPK pathway. For example, the region of the AA281370polypeptide SEQ ID NO: 1564 located between amino acids at positions40-790 has 75% homology to the WD40 domain region of mouse Mapkbp1protein (gi|47124622) (FIG. 63 a); and the amino acids at positions40-886 of the AA281370 polypeptide SEQ ID NO: 1564 has 70% homology torat JNK-binding protein JNKBP1 (gi|34856717) (FIG. 63 b).

Expression of AA281370 transcripts detectable by SEQ ID NO: 1558 (e.g.,variant no. 0, 1, 4 and 5, represented in SEQ IDs 1559-1562) wasmeasured by real time PCR. In parallel the expression of fourhousekeeping genes—PBGD (GenBank Accession No. BC019323 (SEQ IDNO:1713); amplicon—SEQ ID NO:1471), HPRT1 (GenBank Accession No.NM_(—)000194 (SEQ ID NO:1714); amplicon—SEQ ID NO:1468), Ubiquitin(GenBank Accession No. BC000449 (SEQ ID NO:1711); amplicon—SEQ IDNO:1474) and SDHA (GenBank Accession No. NM_(—)004168 (SEQ ID NO:1712);amplicon—SEQ ID NO:1477), was measured similarly. For each RT sample,the expression of SEQ ID NO:1558 was normalized to the geometric mean ofthe quantities of the housekeeping genes. The normalized quantity ofeach RT sample was then divided by the median of the quantities of thenormal post-mortem (PM) samples (Sample Nos. 47-50, 90-93, 96-99, Table2, “Tissue samples in testing panel”, above), to obtain a value of foldup-regulation for each sample relative to median of the normal PMsamples.

FIG. 58 is a histogram showing over expression of the above-indicatedAA281370 transcripts in cancerous lung samples relative to the normalsamples. The number and percentage of samples that exhibit at least 6fold over-expression, out of the total number of samples tested isindicated in the bottom.

As is evident from FIGS. 58, the expression of AA281370 transcriptsdetectable by SEQ ID NO:1558 in cancer samples was significantly higherthan in the non-cancerous samples (Sample Nos. 46-50, 90-93, 96-99,Table 2, “Tissue samples in testing panel”). Notably an over-expressionof at least 6 fold was found in 8 out of 8 small cell carcinoma, 2 outof 16 squamous cell carcinoma samples, and in 1 out of 4 large cellcarcinoma samples.

Statistical analysis was applied to verify the significance of theseresults, as described below.

The P value for the difference in the expression levels of AA281370transcripts detectable by SEQ ID NO:1558 in lung cancer samples versusthe normal lung samples was determined by T test as 8.58E-07.

Threshold of 6 fold overexpression was found to differentiate betweencancer and normal samples with P value of 4.81E-02 as checked by exactfisher test. The above values demonstrate statistical significance ofthe results.

According to the present invention, AA281370 transcripts are anon-limiting example of a marker for diagnosing lung cancer. TheAA281370 marker of the present invention, can be used alone or incombination, for various uses, including but not limited to, prognosis,prediction, screening, early diagnosis, therapy selection and treatmentmonitoring of lung cancer. Although optionally any method may be used todetected overexpression and/or differential expression of this marker,preferably a NAT-based technology is used. Therefore, optionally andpreferably, any nucleic acid molecule capable of selectively hybridizingto AA281370 as previously defined is also encompassed within the presentinvention. Primer pairs are also optionally and preferably encompassedwithin the present invention; for example, for the above experiment, thefollowing primer pair was used as a non-limiting illustrative exampleonly of a suitable primer pair: AA281370-forward primer (SEQ ID NO:1556): GGTTCGGATGGACTACACTTTGTC; and AA281370-Reverse primer (SEQ ID NO:1557): CCACGTACTTCTGGGTGATGTC.

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: AA281370-amplicon (SEQID NO: 1558):GGTTCGGATGGACTACACTTTGTCCGTACCCACCACGTAGCAGAGAAAACCACCTTGTATGACATGGACATTGACATCACCCAGAAGTACGTGG.

According to other preferred embodiments of the present invention,AA281370 or a fragment thereof comprises a biomarker for detecting lungcancer. Optionally and more preferably, AA281370 splice variants, asdepicted in SEQ ID NO:1558 (e.g., variants no: 0, 1, 4 and 5), or afragment thereof comprise a biomarker for detecting lung cancer.Optionally and more preferably, the fragment of AA281370 comprisessegment_TAA seg 10 SEQ ID NO: 1567, Also optionally and more preferably,any suitable method may be used for detecting a fragment such asAA281370_segment_TAA seg 10 SEQ ID NO: 1567 for example. Mostpreferably, NAT-based technology used, such as any nucleic acid moleculecapable of specifically hybridizing with the fragment. Optionally andmost preferably, a primer pair is used for obtaining the fragment.

According to other preferred embodiments, the present invention alsooptionally and preferably encompasses AA281370 splice variantscontaining the unique segments as depicted in SEQ ID NO: 1568, forexample transcripts 4 and 5, as depicted in SEQ ID NOs: 1561 and 1562,comprises a biomarker for detecting lung cancer.

According to still other preferred embodiments, the present inventionoptionally and preferably encompasses any amino acid sequence orfragment thereof encoded by a nucleic acid sequence corresponding toAA281370 as described above, including but not limited to SEQ ID NOs:1563-1566. Any oligopeptide or peptide relating to such an amino acidsequence or fragment thereof may optionally also (additionally oralternatively) be used as a biomarker, including but not limited to theunique amino acid sequence of the proteins SEQ ID NOs: 1563-1566, asdepicted in SEQ ID NOs: 1569, 1570 and 1571.

The present invention also optionally encompasses antibodies capable ofrecognizing, and/or being elicited by, such oligopeptides or peptides.

The present invention also optionally and preferably encompasses anynucleic acid sequence or fragment thereof, or amino acid sequence orfragment thereof, corresponding to AA281370 as described above,optionally for any application.

Expression of Sulfatase 1 (SULF1)-[221368], Transcripts which areDetectable by Amplicon as Depicted in SEQ ID NO:1574 in Normal andCancerous Lung Tissues

SULF1 is a secreted protein which is found in the extracellular matrix.It is known to be downregulated in many epithelial cancer types.

Expression of Sulfatase 1 (SULF1) transcripts detectable by SEQ IDNO:1574 (e.g., variant no. 13 and 14, represented in SEQ ID 1578, 1579)was measured by real time PCR. In parallel the expression of fourhousekeeping genes—PBGD (GenBank Accession No. BC019323 (SEQ IDNO:1713); amplicon—SEQ ID NO:1471), HPRT1 (GenBank Accession No.NM_(—)000194 (SEQ ID NO:1714); amplicon—SEQ ID NO:1468), Ubiquitin(GenBank Accession No. BC000449 (SEQ ID NO:1711); amplicon—SEQ IDNO:1474) and SDHA (GenBank Accession No. NM_(—)004168 (SEQ ID NO:1712);amplicon—SEQ ID NO:1477), was measured similarly. For each RT sample,the expression of SEQ ID NO: 1574 was normalized to the geometric meanof the quantities of the housekeeping genes. The normalized quantity ofeach RT sample was then divided by the median of the quantities of thenormal post-mortem (PM) samples (Sample Nos. 47-50, 90-93, 96-99, Table2, “Tissue samples in testing panel”, above), to obtain a value of foldup-regulation for each sample relative to median of the normal PMsamples.

FIG. 59 is a histogram showing over expression of the above-indicatedSulfatase 1 (SULF1) transcripts in cancerous lung samples relative tothe normal samples. The number and percentage of samples that exhibit atleast 8 fold over-expression, out of the total number of samples testedis indicated in the bottom.

As is evident from FIG. 59, the expression of Sulfatase 1 (SULF1)transcripts detectable by SEQ ID NO: 1574 in cancer samples originatefrom non-cell carcinoma was significantly higher than in thenon-cancerous samples (Sample Nos. 46-50, 90-93, 96-99, Table 2, “Tissuesamples in testing panel”). Notably an over-expression of at least 8fold was found in 11 out of 15 adenocarcinoma samples, 11 out of 16squamous cell carcinoma samples, and in 4 out of 4 large cell carcinomasamples.

Statistical analysis was applied to verify the significance of theseresults, as described below.

The P value for the difference in the expression levels of Sulfatase 1(SULF1) transcripts detectable by SEQ ID NO: 1574 in lung cancer samplesversus the normal lung samples was determined by T test as 3.18E-07.Threshold of 8 fold overexpression was found to differentiate betweencancer and normal samples with P value of 1.18E-04 as checked by exactfisher test.

The above values demonstrate statistical significance of the results.

According to the present invention, Sulfatase 1 (SULF1) is anon-limiting example of a marker for diagnosing lung cancer. TheSulfatase 1 (SULF1) marker of the present invention, can be used aloneor in combination, for various uses, including but not limited to,prognosis, prediction, screening, early diagnosis, therapy selection andtreatment monitoring of lung cancer. Although optionally any method maybe used to detected overexpression and/or differential expression ofthis marker, preferably a NAT-based technology is used. Therefore,optionally and preferably, any nucleic acid molecule capable ofselectively hybridizing to Sulfatase 1 (SULF1) as previously defined isalso encompassed within the present invention. Primer pairs are alsooptionally and preferably encompassed within the present invention; forexample, for the above experiment, the following primer pair was used asa non-limiting illustrative example only of a suitable primer pair:Sulfatase 1 (SULF1)-forward primer (SEQ ID NO: 1572):ACTCACTCAGAGACTAACACAAAGGAAG; and Sulfatase 1 (SULF1)-Reverse primer(SEQ ID NO: 1573): AGTATGGGAAGAATTTACTGGTCACA.

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: Sulfatase 1(SULF1)-amplicon (SEQ ID NO: 1574):ACTCACTCAGAGACTAACACAAAGGAAGTAATTTCTTACCTGGTCATTATTTAGTCTACAATAAGTTCATCCTTCTTCAGTGTGACCAGTAAATTCTTCCCATACT.

According to other preferred embodiments of the present invention,Sulfatase 1 (SULF1) or a fragment thereof comprises a biomarker fordetecting lung cancer. Optionally and more preferably, Sulfatase 1(SULF1) splice variants, as depicted in SEQ ID NO:1578, 1579 (e.g.,variants no: 13 and 14), or a fragment thereof comprise a biomarker fordetecting lung cancer. Optionally and more preferably, the fragment ofSulfatase 1 (SULF1) comprises segment_TAA seg 5—SEQ ID NO: 1587. Alsooptionally and more preferably, any suitable method may be used fordetecting a fragment such as Sulfatase 1 (SULF1) _segment_ TAA seg 5—SEQID NO: 1587 for example. Most preferably, NAT-based technology used,such as any nucleic acid molecule capable of specifically hybridizingwith the fragment. Optionally and most preferably, a primer pair is usedfor obtaining the fragment.

According to other preferred embodiments of the present invention,Sulfatase 1 (SULF1) splice variants containing the unique segments asdepicted in SEQ ID NOs: 1588-1591, for example transcripts as depictedin SEQ ID NOs: 1575-1577, comprises a biomarker for detecting lungcancer.

According to still other preferred embodiments, the present inventionoptionally and preferably encompasses any amino acid sequence orfragment thereof encoded by a nucleic acid sequence corresponding toSulfatase 1 (SULF1) as described above, including but not limited to SEQID NOs:1586, 1580, 1582, 1584. Any oligopeptide or peptide relating tosuch an amino acid sequence or fragment thereof may optionally also(additionally or alternatively) be used as a biomarker, including butnot limited to the unique amino acid sequence of the protein SEQ ID NO:1580, 1582, 1584, as depicted in SEQ ID NO: 1581, 1583, 1585,respectively.

The present invention also optionally encompasses antibodies capable ofrecognizing, and/or being elicited by, such oligopeptides or peptides.

The present invention also optionally and preferably encompasses anynucleic acid sequence or fragment thereof, or amino acid sequence orfragment thereof, corresponding to Nucleolar protein 4 (NOL4) asdescribed above, optionally for any application.

Expression of SRY (Sex Determining Region Y)-box 2 (SOX2))-[HUMHMGBOX],Transcripts which are Detectable by the Amplicon as Depicted in SEQ IDNO:1594 in Normal and Cancerous Lung Tissues

Expression of SOX2 transcripts detectable by SEQ ID NO:1594 (e.g.,variant no. 0 represented by SEQ ID 1595) was measured by real time PCR.In parallel the expression of four housekeeping genes—PBGD (GenBankAccession No. BC019323 (SEQ ID NO:1713); amplicon—SEQ ID NO:1471), HPRT1(GenBank Accession No. NM_(—)000194 (SEQ ID NO:1714); amplicon—SEQ IDNO:1468), Ubiquitin (GenBank Accession No. BC000449 (SEQ ID NO:1711);amplicon—SEQ ID NO:1474) and SDHA (GenBank Accession No. NM_(—)004168(SEQ ID NO:1712); amplicon—SEQ ID NO: 1477), was measured similarly. Foreach RT sample, the expression of SEQ ID NO: 1594 was normalized to thegeometric mean of the quantities of the housekeeping genes. Thenormalized quantity of each RT sample was then divided by the median ofthe quantities of the normal post-mortem (PM) samples (Sample Nos.47-50, 90-93, 96-99, Table 2, “Tissue samples in testing panel”, above),to obtain a value of fold up-regulation for each sample relative tomedian of the normal PM samples.

FIG. 60 is a histogram showing over expression of the above-indicatedSOX2 transcripts in cancerous lung samples relative to the normalsamples. The number and percentage of samples that exhibit at least 5fold over-expression, out of the total number of samples tested isindicated in the bottom.

As is evident from FIG. 60, the expression of SOX2 transcriptsdetectable by SEQ ID NO: 1594 in cancer samples originate from lungcarcinoma was significantly higher than in the non-cancerous samples(Sample Nos. 46-50, 90-93, 96-99, Table 2, “Tissue samples in testingpanel”). Notably an over-expression of at least 5 fold was found in 4out of 15 adenocarcinoma samples, 10 out of 16 squamous cell carcinomasamples, in 2 out of 4 large cell carcinoma, and in 7 out of 8 smallcell carcinoma samples.

Statistical analysis was applied to verify the significance of theseresults, as described below.

The P value for the difference in the expression levels of SOX2transcripts detectable by SEQ ID NO: 1594 in lung cancer samples versusthe normal lung samples was determined by T test as 4.38E-05.

Threshold of 5 fold overexpression was found to differentiate betweencancer and normal samples with P value of 8.09E-04 as checked by exactfisher test.

The above values demonstrate statistical significance of the results.

According to the present invention, SOX2 is a non-limiting example of amarker for diagnosing lung cancer. The SOX2 marker of the presentinvention, can be used alone or in combination, for various uses,including but not limited to, prognosis, prediction, screening, earlydiagnosis, therapy selection and treatment monitoring of lung cancer.Although optionally any method may be used to detected overexpressionand/or differential expression of this marker, preferably a NAT-basedtechnology is used. Therefore, optionally and preferably, any nucleicacid molecule capable of selectively hybridizing to SOX2 as previouslydefined is also encompassed within the present invention. Primer pairsare also optionally and preferably encompassed within the presentinvention; for example, for the above experiment, the following primerpair was used as a non-limiting illustrative example only of a suitableprimer pair: SOX2-forward primer (SEQ ID NO: 1592): GGCGGCGGCAGGAT; andSOX2-Reverse primer (SEQ ID NO: 1593): GTCGGGAGCGCAGGG.

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: SOX2—amplicon (SEQ IDNO: 1594):GGCGGCGGCAGGATCGGCCAGAGGAGGAGGGAAGCGCTTTTTTTGATCCTGATTCCAGTTTGCCTCTCTCTTTTTTTCCCCCAAATTATTCTTCGCCTGATTTTCCTCGCGGAGCCCTGCGCTCCCGAC.

According to other preferred embodiments of the present invention, SOX2or a fragment thereof comprises a biomarker for detecting lung cancer.Optionally and more preferably, SOX2 splice variants, as depicted in SEQID NO:1595 (e.g., variants no: 0), or a fragment thereof comprise abiomarker for detecting lung cancer. Optionally and more preferably, thefragment of SOX2 comprises segment_TAA seg 2—SEQ ID NO: 1597. Alsooptionally and more preferably, any suitable method may be used fordetecting a fragment such as SOX2 _segment_ TAA seg 2—SEQ ID NO: 1597for example. Most preferably, NAT-based technology used, such as anynucleic acid molecule capable of specifically hybridizing with thefragment. Optionally and most preferably, a primer pair is used forobtaining the fragment.

According to still other preferred embodiments, the present inventionoptionally and preferably encompasses any amino acid sequence orfragment thereof encoded by a nucleic acid sequence corresponding toSOX2 as described above, including but not limited to SEQ ID NOs: SEQ IDNO: 1596. Any oligopeptide or peptide relating to such an amino acidsequence or fragment thereof may optionally also (additionally oralternatively) be used as a biomarker.

The present invention also optionally encompasses antibodies capable ofrecognizing, and/or being elicited by, such oligopeptides or peptides.

The present invention also optionally and preferably encompasses anynucleic acid sequence or fragment thereof, or amino acid sequence orfragment thereof, corresponding to SOX2 as described above, optionallyfor any application.

Expression of Plakophilin 1 (Ectodermal Dysplasia/Skin FragilitySyndrome) (PKP1)—[HSB6PR], Transcripts which are Detectable by theAmplicon as Depicted in SEQ ID NO:1600 in Normal and Cancerous LungTissues

Expression of PKP1 transcripts detectable by SEQ ID NO:1600 (e.g.,variant no. 0, 5 and 6-represented by SEQ IDs 1601-1603) was measured byreal time PCR. In parallel the expression of four housekeepinggenes—PBGD (GenBank Accession No. BC019323 (SEQ ID NO:1713);amplicon—SEQ ID NO:1471), HPRT1 (GenBank Accession No. NM_(—)000194 (SEQID NO:1714); amplicon—SEQ ID NO:1468), Ubiquitin (GenBank Accession No.BC000449 (SEQ ID NO:1711); amplicon—SEQ ID NO:1474) and SDHA (GenBankAccession No. NM_(—)004168 (SEQ ID NO:1712); amplicon—SEQ ID NO: 1477),was measured similarly. For each RT sample, the expression of SEQ ID NO:1600 was normalized to the geometric mean of the quantities of thehousekeeping genes. The normalized quantity of each RT sample was thendivided by the median of the quantities of the normal post-mortem (PM)samples (Sample Nos. 47-50, 90-93, 96-99, Table 2, “Tissue samples intesting panel” above), to obtain a value of fold up-regulation for eachsample relative to median of the normal PM samples.

FIG. 61 is a histogram showing over expression of the above-indicatedPKP1 transcripts in cancerous lung samples relative to the normalsamples. The number and percentage of samples that exhibit at least 7fold over-expression, out of the total number of samples tested isindicated in the bottom.

As is evident from FIG. 61, the expression of PKP1 transcriptsdetectable by SEQ ID NO: 1600 in cancer samples originate from lungcarcinoma was significantly higher than in the non-cancerous samples(Sample Nos. 46-50, 90-93, 96-99, Table 2, “Tissue samples in testingpanel”). Notably an over-expression of at least 7 fold was found in 11out of 16 squamous cell carcinoma samples, and in 1 out of 4 large cellcarcinoma.

Statistical analysis was applied to verify the significance of theseresults, as described below.

The P value for the difference in the expression levels of PKP Itranscripts detectable by SEQ ID NO: 1600 in lung cancer samples versusthe normal lung samples was determined by T test as 3.18E-03.

Threshold of 7 fold overexpression was found to differentiate betweencancer and normal samples with P value of 3.50E-02 as checked by exactfisher test.

The above values demonstrate statistical significance of the results.

According to the present invention, PKP1 is a non-limiting example of amarker for diagnosing lung cancer. The PKP1 marker of the presentinvention, can be used alone or in combination, for various uses,including but not limited to, prognosis, prediction, screening, earlydiagnosis, therapy selection and treatment monitoring of lung cancer.Although optionally any method may be used to detected overexpressionand/or differential expression of this marker, preferably a NAT-basedtechnology is used. Therefore, optionally and preferably, any nucleicacid molecule capable of selectively hybridizing to PKP1 as previouslydefined is also encompassed within the present invention. Primer pairsare also optionally and preferably encompassed within the presentinvention; for example, for the above experiment, the following primerpair was used as a non-limiting illustrative example only of a suitableprimer pair: PKP1-forward primer (SEQ ID NO: 1598):CCCCAGACTCTGTGCACTTCA; and PKP1-Reverse primer (SEQ ID NO: 1599):TGGGCTCTGCTCTGTCTTAGTGTA

The present invention also preferably encompasses any amplicon obtainedthrough the use of any suitable primer pair; for example, for the aboveexperiment, the following amplicon was obtained as a non-limitingillustrative example only of a suitable amplicon: PKP1-amplicon (SEQ IDNO: 1600):CCCCAGACTCTGTGCACTTCAGACCAGCAGCAGCAGGAGGGCTCCCGAGGGCCTTATGAGAAAACCTGTGTGGACATCCCTTGGTGTACACTAAGACAGAGCAGAGCCCA

According to other preferred embodiments of the present invention, PKP1or a fragment thereof comprises a biomarker for detecting lung cancer.Optionally and more preferably, PKP1 splice variants, as depicted in SEQID NO: 1601-1603 (e.g., variants no: 0, 5 and 6), or a fragment thereofcomprise a biomarker for detecting lung cancer. Optionally and morepreferably, the fragment of PKP1 comprises segment_TAA seg 34-SEQ ID NO:1608. Also optionally and more preferably, any suitable method may beused for detecting a fragment such as PKP1_segment_ TAA seg 34-SEQ IDNO: 1608 for example. Most preferably, NAT-based technology used, suchas any nucleic acid molecule capable of specifically hybridizing withthe fragment. Optionally and most preferably, a primer pair is used forobtaining the fragment.

According to other preferred embodiments of the present invention, PKP1splice variants containing the unique segment_(—)8 as depicted in SEQ IDNO: 1607, for example variant 6, as depicted in SEQ ID NO: 1603, aresuitable as biomarkers for detecting lung cancer.

According to still other preferred embodiments, the present inventionoptionally and preferably encompasses any amino acid sequence orfragment thereof encoded by a nucleic acid sequence corresponding toPKP1 as described above, including but not limited to SEQ ID NOs:1604-1606. Any oligopeptide or peptide relating to such an amino acidsequence or fragment thereof may optionally also (additionally oralternatively) be used as a biomarker.

The present invention also optionally encompasses antibodies capable ofrecognizing, and/or being elicited by, such oligopeptides or peptides.

The present invention also optionally and preferably encompasses anynucleic acid sequence or fragment thereof, or amino acid sequence orfragment thereof, corresponding to PKP1 as described above, optionallyfor any application.

Combined Expression of 12 Sequences (SEQ ID NO: 1480, 1517, 1529, 1532,1558, 1574, 1594, 1600, 1616, 1619, 1622, 1625) in Normal and CancerousLung Tissues.

Expression of several transcripts detectable by SEQ ID NOs: 1480, 1517,1529, 1532, 1558, 1574, 1594, 1600, 1616, 1619, 1622, 1625 was measuredby real time PCR (the expression of each SEQ ID was checked separately).In parallel the expression of four housekeeping genes—PBGD (GenBankAccession No. BC019323 (SEQ ID NO:1713); amplicon—SEQ ID NO:1471), HPRT1(GenBank Accession No. NM_(—)000194 (SEQ ID NO:1714); amplicon—SEQ IDNO:1468), Ubiquitin (GenBank Accession No. BC000449 (SEQ ID NO:1711);amplicon—SEQ ID NO:1474) and SDHA (GenBank Accession No. NM_(—)004168(SEQ ID NO:1712); amplicon—SEQ ID NO: 1477), was measured similarly. Foreach RT sample, the expression of SEQ ID NOs: 1480, 1517, 1529, 1532,1558, 1574, 1594, 1600, 1616, 1619, 1622, 1625 was normalized to thegeometric mean of the quantities of the housekeeping genes. Thenormalized quantity of each RT sample was then divided by the median ofthe quantities of the normal post-mortem (PM) samples (Sample Nos.47-50, 90-93, 96-99, Table 2, “Tissue samples in testing panel”, above),to obtain a value of fold up-regulation for each sample relative tomedian of the normal PM samples.

FIG. 62 is a histogram showing over expression of the above-indicatedtranscripts in cancerous lung samples relative to the normal samples.The number and percentage of samples that exhibit at least 10 foldover-expression of at least one of the SEQ IDs, out of the total numberof samples tested is indicated in the bottom.

As is evident from FIG. 62, an over-expression of at least 10 fold in atleast one of the SEQ IDs was found in 15 out of 15 adenocarcinomasamples, 15 out of 16 squamous cell carcinoma samples, 4 out of 4 largecell carcinoma samples, and in 8 out of 8 small-cell samples.

Statistical analysis was applied to verify the significance of theseresults, as described below. Threshold of 10 fold overexpression of atleast one of the amplicons as depicted in SEQ ID NOs: 1480, 1517, 1529,1532, 1558, 1574, 1594, 1600, 1616, 1619, 1622, 1625, was found todifferentiate between cancer and normal samples with P value of 2.37E-08as checked by exact fisher test.

The above values demonstrate statistical significance of the results.

Kits and Diagnostic Assays and Methods

The markers described with regard to any of Examples above can be usedalone, in combination with other markers described above, and/or withother entirely different markers, including but not limited to UbcH10(see U.S. Patent Application Nos: 60/535,904 and 60/572,122; attorneyrefs: 27080 and 28045, filed on Jan. 13 and May 19 2004, respectively),Troponin (see U.S. Patent Application No: 60/539,129; attorney ref:26940), Sim2 (see PCT Application No. WO 2004/012847), PE-10 (SP-A),TTF-1, Cytokeratin 5/6, to aid in the diagnosis of lung cancer. All ofthese applications are hereby incorporated by reference as if fully setforth herein. These markers can be used in combination with othermarkers for a number of uses, including but not limited to, prognosis,prediction, screening, early diagnosis, therapy selection and treatmentmonitoring of lung cancer, and also optionally including staging of thedisease. Used together, they may provide more information for thediagnostician, increasing the percentage of true positive and truenegative diagnoses and decreasing the percentage of false positive orfalse negative diagnoses, as compared to the results obtained with asingle marker alone.

Assays and methods according to the present invention, as describedabove, include but are not limited to, immunoassays, hybridizationassays and NAT-based assays. The combination of the markers of thepresent invention with other markers described above, and/or with otherentirely different markers to aid in the diagnosis of lung cancer couldbe carried out as a mix of NAT-based assays, immunoassays andhybridization assays. According to preferred embodiments of the presentinvention, the assays are NAT-based assays, as described for examplewith regard to the Examples above.

In yet another aspect, the present invention provides kits for aiding adiagnosis of lung cancer, wherein the kits can be used to detect themarkers of the present invention. For example, the kits can be used todetect any one or combination of markers described above, which markersare differentially present in samples of a lung cancer patients andnormal patients. The kits of the invention have many applications. Forexample, the kits can be used to differentiate if a subject has a smallcell lung cancer, non-small cell lung cancer, adenocarcinoma,bronchoalveolar-alveolar, squamous cell or large cell carcinomas or hasa negative diagnosis, thus aiding a lung cancer diagnosis. In anotherexample, the kits can be used to identify compounds that modulateexpression of the markers in in vitro lung cells or in vivo animalmodels for lung cancer.

In one embodiment, a kit comprises: (a) a substrate comprising anadsorbent thereon, wherein the adsorbent is suitable for binding amarker, and (b) a washing solution or instructions for making a washingsolution, wherein the combination of the adsorbent and the washingsolution allows detection of the marker as previously described.

Optionally, the kit can further comprise instructions for suitableoperational parameters in the form of a label or a separate insert. Forexample, the kit may have standard instructions informing a consumer/kituser how to wash the probe after a sample of seminal plasma or othertissue sample is contacted on the probe.

In another embodiment, a kit comprises (a) an antibody that specificallybinds to a marker; and (b) a detection reagent. Such kits can beprepared from the materials described above.

In either embodiment, the kit may optionally further comprise a standardor control information, and/or a control amount of material, so that thetest sample can be compared with the control information standard and/orcontrol amount to determine if the test amount of a marker detected in asample is a diagnostic amount consistent with a diagnosis of lungcancer.

It is appreciated that certain features of the invention, which are, forclarity, described in the context of separate embodiments, may also beprovided in combination in a single embodiment. Conversely, variousfeatures of the invention, which are, for brevity, described in thecontext of a single embodiment, may also be provided separately or inany suitable subcombination.

Although the invention has been described in conjunction with specificembodiments thereof, it is evident that many alternatives, modificationsand variations will be apparent to those skilled in the art.Accordingly, it is intended to embrace all such alternatives,modifications and variations that fall within the spirit and broad scopeof the appended claims. All publications, patents and patentapplications mentioned in this specification are herein incorporated intheir entirety by reference into the specification, to the same extentas if each individual publication, patent or patent application wasspecifically and individually indicated to be incorporated herein byreference. In addition, citation or identification of any reference inthis application shall not be construed as an admission that suchreference is available as prior art to the present invention.

1-23. (canceled)
 24. An isolated polypeptide comprising the amino acidsequence set forth in SEQ ID NO:
 1795. 25. The polypeptide of claim 24,wherein said polypeptide consists of the amino acid sequence set forthin SEQ ID NO:
 1795. 26. An isolated polypeptide comprising an amino acidsequence at least 95% identical to SEQ ID NO:1300.
 27. An isolatedpolypeptide consisting of the amino acid sequence of SEQ ID NO: 1793.28. A monoclonal or polyclonal antibody that specifically binds to anepitope in a polypeptide of claim 24, or an epitope-binding fragmentthereof.
 29. The antibody of claim 28 that specifically binds to anepitope in a polypeptide consisting of the amino acid sequence of anyone of SEQ ID NOs: 1300 or
 1795. 30. A kit for detecting lung cancer,comprising the antibody of claim
 28. 31. The kit of claim 30, whereinsaid kit further comprises at least one immunoassay reagent.
 32. The kitof claim 31, wherein said immunoassay reagent is selected from the groupconsisting of an enzyme linked immunosorbent assay (ELISA), animmunoprecipitation assay, an immunofluorescence analysis, an enzymeimmunoassay (EIA), a radioimmunoassay (RIA), and a Western blotanalysis.
 33. A method for detecting lung cancer, comprising detectingoverexpression of the polypeptide comprising the polypeptide sequencewith the amino acid sequence of SEQ ID NOs: 1300 or 1795 in a sample.34. The method of claim 33, wherein detecting cancer comprises detectingthe presence or severity of the cancer, prognosis, prediction,screening, early diagnosis, staging, treatment selection, treatmentmonitoring.
 35. A biomarker for detecting lung cancer comprising apolypeptide with the amino acid sequence of claim 25 marked with alabel.